I am using tesseract 3.01 in c++ for processing document images. In general 
I am very confident with the result, but I noticed tesseract is having 
difficulties detecting words consisting of only lower-case characters with 
approximatly the same height. 

Here are some example (I am processing german documents).

[expected result] -> [ocr result] (average confidence value)

zur -> ZUT (83%)
unser -> 1.111391' (82%)
unsere -> UDSGTG (78%)
vom -> VOTT1 (90%)
vom -> 1'191]!  (66%)

When a word contains e.g. a  'g' or 'h' everythings is fine though. 

Is this a known issue or is there probably an error in my code? 

Alternatively, is there a way to configure tesseract to get better results? 
Currently I am using default settings, OEM_DEFAULT and saveBestChoices. 


-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to