Hi John, On Wed, Mar 12, 2014 at 04:57:38AM -0700, John Green wrote: > Bottom line up front: Has anyone compiled a list of common misperceptions on > the part of tesseract? E.g.: e is often seen as o and l can be mistaken for 1, > etc.
Tesseract has some basic information of that sort built in to its training files, which it uses to help recognition. You can see the list for english by unpacking the english .traineddata file: combine_tessdata -u /path/to/eng.traineddata eng. And then looking at the resulting eng.unicharambigs file. It's documented in the manpage unicharambigs.5, and it's pretty straightforward. Nick -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.

