On Mon, Apr 29, 2013 at 07:00:47AM -0700, Michael Sander wrote: > On a related note, why is tesseract even generating these characters in the > first place given the fact that I chose English as the training data?
They are english characters. They're ligatures, used in printed English a lot. Look closely at the nicest printed books you have for fi and fl and you'll find they're joined in a different way to if they had just been separate letters. So it is reasonable for Tesseract to try to recognise when they're used, as its goal is recognising printed text. Nick -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.