On Thu, Oct 25, 2012 at 8:21 AM, Gaara Sabaku
<[email protected]> wrote:

> tesseact can improve its accuracy if it can scan the document multiple
> times. The only way to achieve this benefit is to include the repetition in
> a single tiff with multiple pages. You can identify single instances of
> "hard" words when parsing the "confedence" of the output.

Awesome - I'll create a small range of varying contrasts, pack'em in a
tiff, and run them all at once. That's just the kind of high-level
algorithmic help I needed!


On Thu, Oct 25, 2012 at 9:43 AM, Tom Morris <[email protected]> wrote:

> I'm having a hard time understanding how you got from changing contrast to
> just changing "one pixel." Changing the contrast is more likely to change
> *most* of the pixels in the image and have knock-on effects on the
> thresholding.  I'm not surprised it has dramatic effects.

It's a rubric for "unduly sensitive to initial conditions". Two files
can appear to the human eye to be the same; tesseract will correctly
parse one, and emit pure gibberish for the other.

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to