Sorry, tesseract version is 3.05.01 пятница, 25 августа 2017 г., 2:06:52 UTC+7 пользователь Yury написал: > > I think No. > > I call tesseract 5.03 from Python under Win 8 for recognition text on > Kannada. > The quality of recognition is fine with 80%. However some symbols are > divided into 2 halves. One of them is correct, another one is replaced by ಲ. > Example: ಕಾಂ (one char) recognized as ಕಾಲ (two chars), ನಿಂ recognised as > ನಿಲ and so on, although separate chars ಕಾ, ನಿ, ... are recognised correctly. > I unpacked the file .unicharset from kan.traineddata and tryed to correct > character's parameters. > I summarized width of both chars in pair, added some gap and put it into > min/max width (with some deviation). Also I corrected min/max other params > from the fine recognition chars. > After that I overwrote unicharset in existing traineddata and saw no > difference. > I tried so many values and didn't see any changes for recognition. > In the end I put ten zeros (0,0,0,0,...) in parameters of ಲ char - result > is the same (ಲ is recognised as usual). > > I think, in the new version of tesseract the quality of recognition > doesn't depend on the parameters of unicharset. > > So, how can I put some tuning into tesseract ? > Are there any other methods of management to tesseract ? > I don't want to learn tesseract over again because I don't have any big > text with all characters (my unicharset have 2851 chars). > > On the other hand, I noticed that only chars with 1 or 2 bytes' unicode > lenght are correctly recognized. Characters with 3 or more bytes' lenght > are not always recognized. > Are there any additional parameters to remove limitations on the number of > bytes per symbol ? >
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/26fe216a-ed06-4e32-84aa-436ac830101a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.