I am using tesseract 3.01 in c++ for processing document images. In general I am very confident with the result, but I noticed tesseract is having difficulties detecting words consisting of only lower-case characters with approximatly the same height.
Here are some example (I am processing german documents). [expected result] -> [ocr result] (average confidence value) zur -> ZUT (83%) unser -> 1.111391' (82%) unsere -> UDSGTG (78%) vom -> VOTT1 (90%) vom -> 1'191]! (66%) When a word contains e.g. a 'g' or 'h' everythings is fine though. Is this a known issue or is there probably an error in my code? Alternatively, is there a way to configure tesseract to get better results? Currently I am using default settings, OEM_DEFAULT and saveBestChoices. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

