one... pixel... difference

Phlip Wed, 24 Oct 2012 23:43:40 -0700

Tesseractors:

We are using Tesseract for an outside-of-the-box situation - not
scanning neatly typed documents.


Our situation is a fuzzy, low-contrast picture. But - even when I use
many image enhancements, such as leveling the colors, blurring them,
improving the contrast, shrinking the image, etc, I still get the same
situation.

One scan will OCR correctly into text, and the next will contain
garbage. Specifically, even the tiniest difference in image
enhancement, such as bumping the contrast from 49% to 51%, can cause
this effect. It's as if tesseract is sensitive to one pixel's
difference.

I'm aware this is a FAQ, and I have read all the traffic I can find on
it. Maybe, for example, if I could declare a required font size, then
tesseract would engage on the first correct letter, instead of the
first stray pixel, and get the scan right more often.

(Yes, we could dive into the learning system, and learn us a fuzzy
block-capitals font. But the next input object could possibly use a
slightly different font, so we'd be back to square-one!)

So, how to get a more stable, reproducible scan?

-- 
  Phlip
  http://c2.com/cgi/wiki?ZeekLand

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

one... pixel... difference

Reply via email to