I feel this question is also very interesting. I am also to achieve good results with digits-only or cyrillic-large-letters-only recognition, and it really looks strange. After training lstmeval reports perfect results (lstmeval stops at 0,0001% rate in several hours of working), buf I run standard recognition - result is really far from what expected (actual quality on handwritten text is ~60%).
I tested it with my own tool, that helps me drawing boxes and combines train images from scanned pages, and another tool that tests training results using same box and template files. четверг, 31 мая 2018 г., 14:13:43 UTC+3 пользователь Julien Jemine написал: > > Hi, > > I've trained a LSTM model for a custom language from scratch as explained > here > <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00>. > > The language only has about 100 words and 17 characters, so it's pretty > simple. > > When I run lstmeval on my model, I get a perfect match: > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b6806974-de21-46ca-9989-1b67268efa0e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

