Thanks Andrei,

You are right of course - but I have not yet mastered the art of
thresholding so the only thresholding done is by Tesseract, which I
believe is simplistic and with a single threshold value applied to the
entire image (ie not adaptive).
I also don't do noise reduction yet.

In any case, telling Tesseract not to waste time on sizes known to be
too small seems like a must do - I just need someone to let me know
the name of that elusive variable ... come on guys, I'll offer a cash
prize for that name :-)!

On Jan 21, 4:25 am, andrei_c <[email protected]> wrote:
> Not sure if I'm being helpful, but it sounds like either your input
> image is noisy or thresholding algorithm incorrectly separated
> foreground from background. If it's former, noise reduction of
> original image would help. If latter, you probably need to choose
> thresholding algorithm more appropriate for your input image.
>
> That said, I don't know how to suppress small rows efficiently.
>
> Andrei
>
> On Jan 17, 11:55 am, patrickq <[email protected]> wrote:
>
>
>
> > I am scanning images with large, clear text but on a grainy background
> > and although I get the text fine, I also get myriads of irrelevant
> > letters with a size of 3 or 5 pixels (way below a size at which
> > anything could be recognized accurately). I could eliminate them based
> > on size post-OCR but meanwhile Tesseract spent minutes recognizing
> > these characters. Could someone please point me to the right variable
> > (s) to tell Tesseract to not attempt recognition (and ideally not
> > return boxes at the layout analysis phase) below a certain size?
>
> > I assume that the variable in question regards the min expected height
> > of a row (rather than of individual characters) since a dot ('.') for
> > example can be quite small even within a row with normal sized
> > letters.
>
> > Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to