Re: Nearly Identical Characters

Nick White Fri, 23 Nov 2012 18:09:57 -0800

On Fri, Nov 23, 2012 at 09:20:23AM -0600, John Williams wrote:
> Has anyone dealt with performing OCR with fonts that have very similar, if not
> identical, characters? In the font I'm dealing with, capital "I" (as in ivory)
> and lowercase "l" (as in llama) look pretty much the same. I've attached a
> couple samples with the words "Plain" and "IS."


Tesseract takes case into account, so is unlikely to insert a
capital I into the middle of a word instead of a l. At least in
theory. There may well be a config variable you can change to
increase the penalty for "unexpected case" as well. Though I
haven't actually done anything with cases myself, and I may be
misremembering.

Have you tested, and what sort of places were they getting mixed up?

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Nearly Identical Characters

Reply via email to