On Nov 2, 2009, at 11:32 AM, Craig White wrote:

On Mon, 2009-11-02 at 08:31 -0700, Matt Graham wrote:
I spent 3 or 4 years doing stuff like this on the NYT, Wall Street
Journal, Christian Science Monitor, and Boston Globe.  You will NOT
be able to get decent OCR with free software.  Newspapers require
a different approach than most OCR packages take; you have to split
each article up into multiple individual image files and OCR each
file separately, then stitch the results back together.  And editing
the results is totally necessary since newspaper text is so horrible
in quality.
----
I don't know anything about GOCR at all.

A few years ago I set up tesseract and it worked as well as I have seen
any OCR program work (in terms of accuracy) though clearly there are
many limitations compared to something like Omnipage. In the end it was
rather easy to install and get it working.

http://code.google.com/p/tesseract-ocr/

Google uses tesseract in their ocropus project. Ocropus seems promising, but is still at a fairly early stage.
http://code.google.com/p/ocropus/

alex

Attachment: PGP.sig
Description: This is a digitally signed message part

---------------------------------------------------
PLUG-discuss mailing list - [email protected]
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss

Reply via email to