On 19 July 2010 13:30, Jimmy O'Regan <[email protected]> wrote: > On 19 July 2010 13:20, patrickq <[email protected]> wrote: >> This is a great example of a serious problem with Tesseract when >> analyzing any image with fonts of variable sizes such as a street >> sign, flyer, business card etc. What happens is that Tesseract's >> adaptive classifier makes assumptions about letter heights and uses >> that knowledge when recognizing the next characters. This is right and >> useful when parsing a word or (to a lesser degree but still) a >> sentence with words separated by spaces because in that case it makes >> sense to assume uniformity. However it is dead wrong when dealing with >> different blocks. In your case, the tall bar is separated by enough >> space that it should be treated as a different block and that letter >> should NOT cause Tesseract to assume ANYTHING about letter height when >> it tackles the next block with the phone number. >> >> The good news is that the fix required in Tesseract is really not that >> hard, it's essentially about resetting the adaptive classifier between >> blocks (separated by space larger than a blank vertically or like your >> example, horizontally). Even better news: Jimmy is working on it ... > > Well, it won't do him any good because he's using tessnet2, so he > won't get the fix if/when I find it.
My apologies; I assumed 'he', which was quite a sexist assumption to make. -- <Leftmost> jimregan, that's because deep inside you, you are evil. <Leftmost> Also not-so-deep inside you. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

