Thanks for the idea. I have scanned the first 40 pages and have a little script running cleaning and OCRing.
I will see how much of a pain it is tomorow... I cant face it this evening. Thanks, Stuart On Fri, Sep 13, 2013 at 8:59 PM, Robert Komar <[email protected]> wrote: > Hi Stuart, > if the characters that touch do so consistently, then > maybe you can train your own "language", including > in it the pairs of characters that usually connect. > I'm pretty sure that Google already does this for > cases like "fi" and "fl". You can then tell tesseract > to use both "english" and your new "language" when > doing OCR. I've never trained myself, and usually > consider it to be a waste of time for English, but > in this case, it may be worth trying if correcting > by hand is going to take a really long time. > > Cheers, > > Rob > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > tesseract-ocr+unsubscribe@**googlegroups.com<tesseract-ocr%[email protected]> > For more options, visit this group at > http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en> > > --- You received this message because you are subscribed to a topic in the > Google Groups "tesseract-ocr" group. > To unsubscribe from this topic, visit https://groups.google.com/d/** > topic/tesseract-ocr/_3fvIpG-**TPI/unsubscribe<https://groups.google.com/d/topic/tesseract-ocr/_3fvIpG-TPI/unsubscribe> > . > To unsubscribe from this group and all its topics, send an email to > tesseract-ocr+unsubscribe@**googlegroups.com<tesseract-ocr%[email protected]> > . > For more options, visit > https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out> > . > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

