On Mon, Apr 29, 2013 at 07:00:47AM -0700, Michael Sander wrote:
> On a related note, why is tesseract even generating these characters in the
> first place given the fact that I chose English as the training data?

They are english characters. They're ligatures, used in printed
English a lot. Look closely at the nicest printed books you have for
fi and fl and you'll find they're joined in a different way to if
they had just been separate letters.

So it is reasonable for Tesseract to try to recognise when they're
used, as its goal is recognising printed text.

Nick

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to