You can try to finetune tessdata_best/script/Arabic.traineddata for Ottoman.
If you have line images and their groundtruth transcription, you can use makefile process from tesstrain. See https://github.com/tesseract-ocr/tesstrain/issues/128 Tesseract recognizes images to Unicode code points (UTF8 text). If all required Ottoman characters do not have a Unicode codepoint, then you may have to assign some random letter instead. On Wed, Dec 18, 2019 at 11:34 PM Serkan Taş <serkan....@gmail.com> wrote: > Hi Ibrahim, > > You helped me so much, and I have new questions :) > > 1.Do I need a new model for ottoman, what you think ? > > 2. From your comment I understand that if any letter that does not have > ASCII correspondence can not be recognized and converted to text. Right ? > If yes can we say that that letters can never be contained in OCR ? > > <http://bhajans.ramparivar.com> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV5_HUbfUM4sTJJb_b7jFK5u9%3DN_YpXROTDfW_my-K_bg%40mail.gmail.com.