Hi Ibrahim,

According to Shree's advices I am going to work on training for some time,
of course before I am going to work on the alphabet and other symbols in
arabic and farsi dataset which are common with ottoman. I am still not sure
how to finetune existing data set but going to try to understand.

For ms-word, when I install TTF prepared for Ottoman alphabet, yes I can
see all 34 letters of ottoman in a document,

On Thu, Dec 19, 2019 at 11:10 AM Ibr <ibr.ham...@gmail.com> wrote:

> Hi Serkan,
>>
>
> My pleasure brother, any time :)
>
> *"**Do I need a new model for ottoman, what you think ?"* of course I
> think It would help you a lot but honestly I really have no clue how to
> create a trained data for Ottoman or any other language, that's why maybe
> your best shot is Farsi trained date, unless of course you know how to
> create Ottoman trained data
>
>
> *"I understand that if any letter that does not have ASCII correspondence
> can not be recognized and converted to text. Right ? ** if yes can we say
> that that letters can never be contained in OCR ?"*  theoretically yes if
> I understand this matter correct, why I mentioned the Unicode and ASCII at
> the first place? because I have faced this issue before and I opened an
> issue about it, refer to this issue
> <https://github.com/tesseract-ocr/tesstrain/issues/128> and you can see
> how each character has its own corresponding code. that's why I asked you
> if the Ottoman writing system is recognized by other editors such as MS
> Office, according to Shree's comment *"If all required Ottoman characters
> do not have a Unicode codepoint, then you may have to assign some random
> letter instead"* seems like any Ottoman letter doesn't contain its code
> wont be recognized, again, I think if you look deeper into Farsi alphabet
> and compare it with the Ottoman alphabet you might conclude that Farsi
> should do, since Tesseract doesn't work on meaning only characters,
> unfortunately I can't help you with this since I only know just little of
> Farsi, you need someone specialized in Farsi or a native like an Iranian or
> Azerbaijani.
>
> Good thing that Shree is here, this guy is an expert in this matter and
> helpful as well, specially  since were brought the Unicode and ASCII
> representation and creating trained data to the table he knows these stuff
> more than me
>
> Again, you should pay attention to the quality of the images, some images
> might not have good results but due to some imperfections in the images
> itself like old line or dots, so some image enhancements to the image will
> give better results
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/5b32cb1c-65f1-4fc8-a763-fc42e9d58cca%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/5b32cb1c-65f1-4fc8-a763-fc42e9d58cca%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAGCxbmup9_GCV_QS12Dkxkb22sJpanHQeez9H3xqtkfNMuKA%2BA%40mail.gmail.com.

Reply via email to