> > Hi Serkan, > How Tesseract works is like the following, each language or writing system, it has a model which depend on to make recognition of the characters in the image, I guess it depends on something called (stroke width transformation) which is actually detecting the shapes, if while scanning an image detected a shape (letter in the image) that already recognize Tesseract will assign it as the corresponding letter that has the same shape and write it in the output text, and then the next shape and so on, in Tesseract every language has its own model (a model in ML is more like the brain which decide the results depending on the input), WHY I'm telling you all of this? to give you an idea how it works and to let you know, you can't be conclusive about the results, even with great accuracy you might still have some errors, that's how machine learning in general, that's why usually people train the model and to enhance its accuracy,
About Ottoman writing system you said *"**The language is Turkish originally"* Tesseract doesn't care about the meaning of the text, just the shapes, *"**alphabet is some kind of mixture of Arabic and Farsi alphabet"* I'm a native Arabic speaker, yet I can read the first image that you have shared "*ödev"* without knowing what it means (except for few words I already know in Turkish language) I also can read Farsi as well, but the problem with Farsi alphabet it contains extra letters that doesn't exist in Arabic, very close but slightly different for example (چ) is same as (ج) but with three dots, in Farsi both letters exists, but in Arabic only the second one exists, so run the two letter on the Farsi model, will work fine, but on Arabic model, I think both letter will be recognized as the letter with the one dot only. Arabic has 28 letters but Farsi has 32 letters I guess, so that means if Ottoman alphabet contains letter from Farsi, Arabic model wont be enough since Farsi contains Arabic letters and some extra letters, now if Ottoman alphabet and Farsi alphabet are the same, for sure Farsi model (I think its fas.traineddata <https://github.com/tesseract-ocr/tessdata/blob/master/fas.traineddata> ) will work fine, but if there are some letters in Ottoman alphabet doesn't exist in Farsi then, these letters wont be recognized or recognized wrong About the font, I'm not sure what is the font used in both pictures but the first picture definitely it exists in the Arabic model, in Tesseract 4 (at least when I used it last time lime almost a year ago) its contains I think 5000 Arabic fonts, which covers almost all the fonts, so I don't think you would need any training on different fonts Last thing, when I used Tesseract it was giving a perfect results for Arabic and Japanese as well, for formal documents, but handwritten documents the accuracy is really low, I don't know if this still the case or not, but if it is, handwritten wont have good results, for example the second image that you have shared "sample01" I assure you it wont be recognized even if you have Ottoman model, the first one I'm not sure, I think it would be recognized but any word that has a small space due to being old document, the resulted word will be separated, to be honest you wont know for sure until you try it on the Tesseract, Tesseract since version 4 is easy to use, specially its not necessary to train the model on new fonts, so in my opinion open a question on this Google group or on GitHub asking if there is an Ottoman model, or since you seem you know these stuff you can decide if the Farsi model will do, try on the Farsi model I wish I was helpful enough, I said to much details but only to give you the full image of what's going on so you would decide if it fits or not, since I don't have enough information about Ottoman writing system, if you still have any question I'm here to help :) teşekkürler :) Ibrahim -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1642e20a-1de4-4f83-aa1b-fbfbbae9fd7e%40googlegroups.com.