Hey, so I am trying to train a new Tesseract model to only recognize certain UTF-8 symbols as I want an OCR that only recognizes these symbols and not other English letters etc. I realize there are two ways I can do this - one is to fine tune Tesseract over the normal English model and then blacklist the English text or train a completely new model that only recognizes this text. I was wondering if I could get some input into which of these - or another method, is better for ease, time and accuracy.
The context is I will have some various texts on a board and I want to recognize the locations of the symbols. However, I don't want to recognize any of the English or anything else as this may mess with my post processing. I have tried a few locations (like restricting where these symbols can be on the board and then only scanning the text in those strips) but I am not satisfied with the results. Additionally, I can also control the font and the size of the text on the board and everything else, except the actual codes. Thanks for the help! -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3237ae86-db20-467c-bebc-6b45f854e799%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

