Re: Ynt: [tesseract-ocr] Re: How to use Tesseract Arabic OCR.

Ibr Thu, 19 Dec 2019 00:10:48 -0800

>
> Hi Serkan,
>

My pleasure brother, any time :)


*"**Do I need a new model for ottoman, what you think ?"* of course I think 
It would help you a lot but honestly I really have no clue how to create a 
trained data for Ottoman or any other language, that's why maybe your best 
shot is Farsi trained date, unless of course you know how to create Ottoman 
trained data 


*"I understand that if any letter that does not have ASCII correspondence 
can not be recognized and converted to text. Right ? ** if yes can we say 
that that letters can never be contained in OCR ?"*  theoretically yes if I 
understand this matter correct, why I mentioned the Unicode and ASCII at 
the first place? because I have faced this issue before and I opened an 
issue about it, refer to this issue 
<https://github.com/tesseract-ocr/tesstrain/issues/128> and you can see how 
each character has its own corresponding code. that's why I asked you if 
the Ottoman writing system is recognized by other editors such as MS 
Office, according to Shree's comment *"If all required Ottoman characters 
do not have a Unicode codepoint, then you may have to assign some random 
letter instead"* seems like any Ottoman letter doesn't contain its code 
wont be recognized, again, I think if you look deeper into Farsi alphabet 
and compare it with the Ottoman alphabet you might conclude that Farsi 
should do, since Tesseract doesn't work on meaning only characters, 
unfortunately I can't help you with this since I only know just little of 
Farsi, you need someone specialized in Farsi or a native like an Iranian or 
Azerbaijani.

Good thing that Shree is here, this guy is an expert in this matter and 
helpful as well, specially  since were brought the Unicode and ASCII 
representation and creating trained data to the table he knows these stuff 
more than me

Again, you should pay attention to the quality of the images, some images 
might not have good results but due to some imperfections in the images 
itself like old line or dots, so some image enhancements to the image will 
give better results

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5b32cb1c-65f1-4fc8-a763-fc42e9d58cca%40googlegroups.com.

Re: Ynt: [tesseract-ocr] Re: How to use Tesseract Arabic OCR.

Reply via email to