I think No. I call tesseract 5.03 from Python under Win 8 for recognition text on Kannada. The quality of recognition is fine with 80%. However some symbols are divided into 2 halves. One of them is correct, another one is replaced by ಲ. Example: ಕಾಂ (one char) recognized as ಕಾಲ (two chars), ನಿಂ recognised as ನಿಲ and so on, although separate chars ಕಾ, ನಿ, ... are recognised correctly. I unpacked the file .unicharset from kan.traineddata and tryed to correct character's parameters. I summarized width of both chars in pair, added some gap and put it into min/max width (with some deviation). Also I corrected min/max other params from the fine recognition chars. After that I overwrote unicharset in existing traineddata and saw no difference. I tried so many values and didn't see any changes for recognition. In the end I put ten zeros (0,0,0,0,...) in parameters of ಲ char - result is the same (ಲ is recognised as usual).
I think, in the new version of tesseract the quality of recognition doesn't depend on the parameters of unicharset. So, how can I put some tuning into tesseract ? Are there any other methods of management to tesseract ? I don't want to learn tesseract over again because I don't have any big text with all characters (my unicharset have 2851 chars). On the other hand, I noticed that only chars with 1 or 2 bytes' unicode lenght are correctly recognized. Characters with 3 or more bytes' lenght are not always recognized. Are there any additional parameters to remove limitations on the number of bytes per symbol ? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/88ea2f83-5f72-43b2-b49a-6997604d0f41%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

