tesseact can improve its accuracy if it can scan the document multiple
times. The only way to achieve this benefit is to include the repetition in
a single tiff with multiple pages. You can identify single instances of
"hard" words when parsing the "confedence" of the output.

On Wed, Oct 24, 2012 at 9:37 PM, Phlip <[email protected]> wrote:

> Tesseractors:
>
> We are using Tesseract for an outside-of-the-box situation - not
> scanning neatly typed documents.
>
> Our situation is a fuzzy, low-contrast picture. But - even when I use
> many image enhancements, such as leveling the colors, blurring them,
> improving the contrast, shrinking the image, etc, I still get the same
> situation.
>
> One scan will OCR correctly into text, and the next will contain
> garbage. Specifically, even the tiniest difference in image
> enhancement, such as bumping the contrast from 49% to 51%, can cause
> this effect. It's as if tesseract is sensitive to one pixel's
> difference.
>
> I'm aware this is a FAQ, and I have read all the traffic I can find on
> it. Maybe, for example, if I could declare a required font size, then
> tesseract would engage on the first correct letter, instead of the
> first stray pixel, and get the scan right more often.
>
> (Yes, we could dive into the learning system, and learn us a fuzzy
> block-capitals font. But the next input object could possibly use a
> slightly different font, so we'd be back to square-one!)
>
> So, how to get a more stable, reproducible scan?
>
> --
>   Phlip
>   http://c2.com/cgi/wiki?ZeekLand
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to