On Nov 2, 2009, at 11:32 AM, Craig White wrote:
On Mon, 2009-11-02 at 08:31 -0700, Matt Graham wrote:I spent 3 or 4 years doing stuff like this on the NYT, Wall Street Journal, Christian Science Monitor, and Boston Globe. You will NOT be able to get decent OCR with free software. Newspapers require a different approach than most OCR packages take; you have to split each article up into multiple individual image files and OCR each file separately, then stitch the results back together. And editing the results is totally necessary since newspaper text is so horrible in quality.---- I don't know anything about GOCR at all.A few years ago I set up tesseract and it worked as well as I have seenany OCR program work (in terms of accuracy) though clearly there aremany limitations compared to something like Omnipage. In the end it wasrather easy to install and get it working. http://code.google.com/p/tesseract-ocr/
Google uses tesseract in their ocropus project. Ocropus seems promising, but is still at a fairly early stage.
http://code.google.com/p/ocropus/ alex
PGP.sig
Description: This is a digitally signed message part
--------------------------------------------------- PLUG-discuss mailing list - [email protected] To subscribe, unsubscribe, or to change your mail settings: http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
