Tesseractors: We are using Tesseract for an outside-of-the-box situation - not scanning neatly typed documents.
Our situation is a fuzzy, low-contrast picture. But - even when I use many image enhancements, such as leveling the colors, blurring them, improving the contrast, shrinking the image, etc, I still get the same situation. One scan will OCR correctly into text, and the next will contain garbage. Specifically, even the tiniest difference in image enhancement, such as bumping the contrast from 49% to 51%, can cause this effect. It's as if tesseract is sensitive to one pixel's difference. I'm aware this is a FAQ, and I have read all the traffic I can find on it. Maybe, for example, if I could declare a required font size, then tesseract would engage on the first correct letter, instead of the first stray pixel, and get the scan right more often. (Yes, we could dive into the learning system, and learn us a fuzzy block-capitals font. But the next input object could possibly use a slightly different font, so we'd be back to square-one!) So, how to get a more stable, reproducible scan? -- Phlip http://c2.com/cgi/wiki?ZeekLand -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en