Hi, I am interested in evaluating the performance of Tesseract against some domain specific test. I would like to perform a baseline using vanilla settings and then with some domain-specific user-words and user-patterns as documented here <https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage>. Is it possible to leverage the OCR evaluation process, which must be performed during model training to calculate word and character error rates on new (domain-specific) documents?
If this is not possible, then I could synthesise my own scan images from documents using ImageMagick <https://gist.github.com/ThisIsBenny/1e669954d0fd0a945e38d0670c670c3c> but it would be good if anyone could recommend a standard algorithm/library for calculating character and word error rates. Thanks in advance Matt -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5cb0a65c-dae5-431b-9d0c-2c099d2cf90b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

