On Thu, Oct 25, 2012 at 8:21 AM, Gaara Sabaku <[email protected]> wrote:
> tesseact can improve its accuracy if it can scan the document multiple > times. The only way to achieve this benefit is to include the repetition in > a single tiff with multiple pages. You can identify single instances of > "hard" words when parsing the "confedence" of the output. Awesome - I'll create a small range of varying contrasts, pack'em in a tiff, and run them all at once. That's just the kind of high-level algorithmic help I needed! On Thu, Oct 25, 2012 at 9:43 AM, Tom Morris <[email protected]> wrote: > I'm having a hard time understanding how you got from changing contrast to > just changing "one pixel." Changing the contrast is more likely to change > *most* of the pixels in the image and have knock-on effects on the > thresholding. I'm not surprised it has dramatic effects. It's a rubric for "unduly sensitive to initial conditions". Two files can appear to the human eye to be the same; tesseract will correctly parse one, and emit pure gibberish for the other. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

