This is a great example of a serious problem with Tesseract when
analyzing any image with fonts of variable sizes such as a street
sign, flyer, business card etc. What happens is that Tesseract's
adaptive classifier makes assumptions about letter heights and uses
that knowledge when recognizing the next characters. This is right and
useful when parsing a word or (to a lesser degree but still) a
sentence with words separated by spaces because in that case it makes
sense to assume uniformity. However it is dead wrong when dealing with
different blocks. In your case, the tall bar is separated by enough
space that it should be treated as a different block and that letter
should NOT cause Tesseract to assume ANYTHING about letter height when
it tackles the next block with the phone number.

The good news is that the fix required in Tesseract is really not that
hard, it's essentially about resetting the adaptive classifier between
blocks (separated by space larger than a blank vertically or like your
example, horizontally). Even better news: Jimmy is working on it ...

On Jul 18, 11:40 pm, KAH <[email protected]> wrote:
> I have two files....
>
> http://dl.dropbox.com/u/1531272/pg1-CROP.jpg
> andhttp://dl.dropbox.com/u/1531272/pg1-CROP-Lines.jpg
>
> Note on the "Lines" file there are dark lines on the left and right
> side of this image.
> I am trying to understand why the tessnet dll would render such
> different readings for this image.
>
> Can anyone offer some help or understanding regarding how this product
> reads that would cause this?  Additionally if there are any variables
> I would set that would help I would love to have some direction on
> them.
>
> Thank you for your help.
> KAH

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to