Hi, I'm having some difficulties with the training of Tesseract on a custom font. In particular the text I'm scanning contain control characters that I do not want to be outputted. I've excluded the aforementioned characters from my box model, with the result that they will often instead get recognized as another similar character.
Is it possible to train Tesseract to not output/recognize a character? Options I'm considering: - Map control characters to nothing - Map control characters to unicode characters that are not used and blacklist them. - Pre-process image to find and remove symbols. Any tips/input on the viability of any of these options or a better approach would be appreciated! Sincerely, Tobias S -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

