Thanks for the idea.

I have scanned the first 40 pages and have a little script running cleaning
and OCRing.

I will see how much of a pain it is tomorow... I cant face it this evening.

Thanks,

Stuart


On Fri, Sep 13, 2013 at 8:59 PM, Robert Komar <[email protected]> wrote:

> Hi Stuart,
> if the characters that touch do so consistently, then
> maybe you can train your own "language", including
> in it the pairs of characters that usually connect.
> I'm pretty sure that Google already does this for
> cases like "fi" and "fl".  You can then tell tesseract
> to use both "english" and your new "language" when
> doing OCR.  I've never trained myself, and usually
> consider it to be a waste of time for English, but
> in this case, it may be worth trying if correcting
> by hand is going to take a really long time.
>
> Cheers,
>
> Rob
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscribe@**googlegroups.com<tesseract-ocr%[email protected]>
> For more options, visit this group at
> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>
> --- You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/**
> topic/tesseract-ocr/_3fvIpG-**TPI/unsubscribe<https://groups.google.com/d/topic/tesseract-ocr/_3fvIpG-TPI/unsubscribe>
> .
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscribe@**googlegroups.com<tesseract-ocr%[email protected]>
> .
> For more options, visit 
> https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>
> .
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to