I am scanning images with large, clear text but on a grainy background
and although I get the text fine, I also get myriads of irrelevant
letters with a size of 3 or 5 pixels (way below a size at which
anything could be recognized accurately). I could eliminate them based
on size post-OCR but meanwhile Tesseract spent minutes recognizing
these characters. Could someone please point me to the right variable
(s) to tell Tesseract to not attempt recognition (and ideally not
return boxes at the layout analysis phase) below a certain size?

I assume that the variable in question regards the min expected height
of a row (rather than of individual characters) since a dot ('.') for
example can be quite small even within a row with normal sized
letters.

Thanks!
-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.


Reply via email to