Re: one... pixel... difference

Tom Morris Thu, 25 Oct 2012 09:43:52 -0700

On Wednesday, October 24, 2012 11:37:18 PM UTC-4, Phlip wrote:

> Tesseractors: 
>
> We are using Tesseract for an outside-of-the-box situation - not 
> scanning neatly typed documents. 
>
> Our situation is a fuzzy, low-contrast picture. But - even when I use 
> many image enhancements, such as leveling the colors, blurring them, 
> improving the contrast, shrinking the image, etc, I still get the same 
> situation. 
>
> One scan will OCR correctly into text, and the next will contain 
> garbage. Specifically, even the tiniest difference in image 
> enhancement, such as bumping the contrast from 49% to 51%, can cause 
> this effect. It's as if tesseract is sensitive to one pixel's 
> difference. 
>


I'm having a hard time understanding how you got from changing contrast to 
just changing "one pixel." Changing the contrast is more likely to change 
*most* of the pixels in the image and have knock-on effects on the 
thresholding.  I'm not surprised it has dramatic effects.


> I'm aware this is a FAQ, and I have read all the traffic I can find on 
> it. Maybe, for example, if I could declare a required font size, then 
> tesseract would engage on the first correct letter, instead of the 
> first stray pixel, and get the scan right more often. 
>
> (Yes, we could dive into the learning system, and learn us a fuzzy 
> block-capitals font. But the next input object could possibly use a 
> slightly different font, so we'd be back to square-one!) 
>
> So, how to get a more stable, reproducible scan? 
>
> -- 
>   Phlip 
>   http://c2.com/cgi/wiki?ZeekLand 
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: one... pixel... difference

Reply via email to