On Wednesday, October 24, 2012 11:37:18 PM UTC-4, Phlip wrote: > Tesseractors: > > We are using Tesseract for an outside-of-the-box situation - not > scanning neatly typed documents. > > Our situation is a fuzzy, low-contrast picture. But - even when I use > many image enhancements, such as leveling the colors, blurring them, > improving the contrast, shrinking the image, etc, I still get the same > situation. > > One scan will OCR correctly into text, and the next will contain > garbage. Specifically, even the tiniest difference in image > enhancement, such as bumping the contrast from 49% to 51%, can cause > this effect. It's as if tesseract is sensitive to one pixel's > difference. >
I'm having a hard time understanding how you got from changing contrast to just changing "one pixel." Changing the contrast is more likely to change *most* of the pixels in the image and have knock-on effects on the thresholding. I'm not surprised it has dramatic effects. > I'm aware this is a FAQ, and I have read all the traffic I can find on > it. Maybe, for example, if I could declare a required font size, then > tesseract would engage on the first correct letter, instead of the > first stray pixel, and get the scan right more often. > > (Yes, we could dive into the learning system, and learn us a fuzzy > block-capitals font. But the next input object could possibly use a > slightly different font, so we'd be back to square-one!) > > So, how to get a more stable, reproducible scan? > > -- > Phlip > http://c2.com/cgi/wiki?ZeekLand > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

