You can try to finetune tessdata_best/script/Arabic.traineddata for Ottoman.

If you have line images and their groundtruth transcription, you can use
makefile process from tesstrain.
See https://github.com/tesseract-ocr/tesstrain/issues/128

Tesseract recognizes images to Unicode code points (UTF8 text). If all
required Ottoman characters do not have a Unicode codepoint, then you may
have to assign some random letter instead.

On Wed, Dec 18, 2019 at 11:34 PM Serkan Taş <serkan....@gmail.com> wrote:

> Hi Ibrahim,
>
> You helped me so much, and I have new questions :)
>
> 1.Do I need a new model for ottoman, what you think ?
>
> 2. From your comment I understand that if any letter that does not have
> ASCII correspondence can not be recognized and converted to text. Right ?
> If yes can we say that that letters can never be contained in OCR ?
>
> <http://bhajans.ramparivar.com>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV5_HUbfUM4sTJJb_b7jFK5u9%3DN_YpXROTDfW_my-K_bg%40mail.gmail.com.

Reply via email to