On Fri, Nov 23, 2012 at 09:20:23AM -0600, John Williams wrote: > Has anyone dealt with performing OCR with fonts that have very similar, if not > identical, characters? In the font I'm dealing with, capital "I" (as in ivory) > and lowercase "l" (as in llama) look pretty much the same. I've attached a > couple samples with the words "Plain" and "IS."
Tesseract takes case into account, so is unlikely to insert a capital I into the middle of a word instead of a l. At least in theory. There may well be a config variable you can change to increase the penalty for "unexpected case" as well. Though I haven't actually done anything with cases myself, and I may be misremembering. Have you tested, and what sort of places were they getting mixed up? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

