Hi Stuart,
if the characters that touch do so consistently, then
maybe you can train your own "language", including
in it the pairs of characters that usually connect.
I'm pretty sure that Google already does this for
cases like "fi" and "fl".  You can then tell tesseract
to use both "english" and your new "language" when
doing OCR.  I've never trained myself, and usually
consider it to be a waste of time for English, but
in this case, it may be worth trying if correcting
by hand is going to take a really long time.

Cheers,
Rob

--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to