On 19 July 2010 13:30, Jimmy O'Regan <[email protected]> wrote:
> On 19 July 2010 13:20, patrickq <[email protected]> wrote:
>> This is a great example of a serious problem with Tesseract when
>> analyzing any image with fonts of variable sizes such as a street
>> sign, flyer, business card etc. What happens is that Tesseract's
>> adaptive classifier makes assumptions about letter heights and uses
>> that knowledge when recognizing the next characters. This is right and
>> useful when parsing a word or (to a lesser degree but still) a
>> sentence with words separated by spaces because in that case it makes
>> sense to assume uniformity. However it is dead wrong when dealing with
>> different blocks. In your case, the tall bar is separated by enough
>> space that it should be treated as a different block and that letter
>> should NOT cause Tesseract to assume ANYTHING about letter height when
>> it tackles the next block with the phone number.
>>
>> The good news is that the fix required in Tesseract is really not that
>> hard, it's essentially about resetting the adaptive classifier between
>> blocks (separated by space larger than a blank vertically or like your
>> example, horizontally). Even better news: Jimmy is working on it ...
>
> Well, it won't do him any good because he's using tessnet2, so he
> won't get the fix if/when I find it.

My apologies; I assumed 'he', which was quite a sexist assumption to make.

-- 
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to