Re: Ynt: [tesseract-ocr] Re: How to use Tesseract Arabic OCR.

Ibr Tue, 24 Dec 2019 07:20:56 -0800

Hi Serkan,

if Ottoman Letters have code to represent them then yes its doable


On Friday, December 20, 2019 at 12:06:56 AM UTC+2, Serkan Taş wrote:

> Hi Ibrahim,
>
> According to Shree's advices I am going to work on training for some time, 
> of course before I am going to work on the alphabet and other symbols in 
> arabic and farsi dataset which are common with ottoman. I am still not sure 
> how to finetune existing data set but going to try to understand.
>
> For ms-word, when I install TTF prepared for Ottoman alphabet, yes I can 
> see all 34 letters of ottoman in a document,
>
> On Thu, Dec 19, 2019 at 11:10 AM Ibr <ibr....@gmail.com <javascript:>> 
> wrote:
>
>> Hi Serkan,
>>>
>>
>> My pleasure brother, any time :)
>>
>> *"**Do I need a new model for ottoman, what you think ?"* of course I 
>> think It would help you a lot but honestly I really have no clue how to 
>> create a trained data for Ottoman or any other language, that's why maybe 
>> your best shot is Farsi trained date, unless of course you know how to 
>> create Ottoman trained data 
>>
>>
>> *"I understand that if any letter that does not have ASCII correspondence 
>> can not be recognized and converted to text. Right ? ** if yes can we 
>> say that that letters can never be contained in OCR ?"*  theoretically 
>> yes if I understand this matter correct, why I mentioned the Unicode and 
>> ASCII at the first place? because I have faced this issue before and I 
>> opened an issue about it, refer to this issue 
>> <https://github.com/tesseract-ocr/tesstrain/issues/128> and you can see 
>> how each character has its own corresponding code. that's why I asked you 
>> if the Ottoman writing system is recognized by other editors such as MS 
>> Office, according to Shree's comment *"If all required Ottoman 
>> characters do not have a Unicode codepoint, then you may have to assign 
>> some random letter instead"* seems like any Ottoman letter doesn't 
>> contain its code wont be recognized, again, I think if you look deeper into 
>> Farsi alphabet and compare it with the Ottoman alphabet you might conclude 
>> that Farsi should do, since Tesseract doesn't work on meaning only 
>> characters, unfortunately I can't help you with this since I only know just 
>> little of Farsi, you need someone specialized in Farsi or a native like an 
>> Iranian or Azerbaijani.
>>
>> Good thing that Shree is here, this guy is an expert in this matter and 
>> helpful as well, specially  since were brought the Unicode and ASCII 
>> representation and creating trained data to the table he knows these stuff 
>> more than me
>>
>> Again, you should pay attention to the quality of the images, some images 
>> might not have good results but due to some imperfections in the images 
>> itself like old line or dots, so some image enhancements to the image will 
>> give better results
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesser...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/5b32cb1c-65f1-4fc8-a763-fc42e9d58cca%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/5b32cb1c-65f1-4fc8-a763-fc42e9d58cca%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/cafa2e12-24d2-4080-9347-3f5204050de1%40googlegroups.com.

Re: Ynt: [tesseract-ocr] Re: How to use Tesseract Arabic OCR.

Reply via email to