Hi,

I'm having some difficulties with the training of Tesseract on a
custom font. In particular the text I'm scanning contain control
characters that I do not want to be outputted. I've excluded the
aforementioned characters from my box model, with the result that they
will often instead get recognized as another similar character.

Is it possible to train Tesseract to not output/recognize a character?

Options I'm considering:
- Map control characters to nothing
- Map control characters to unicode characters that are not used and
blacklist them.
- Pre-process image to find and remove symbols.

Any tips/input on the viability of any of these options or a better
approach would be appreciated!

Sincerely,
Tobias S

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to