subject:"\[CODE4LIB\] OCR To ALTO without ABBYY"

[CODE4LIB] OCR To ALTO without ABBYY

2012-09-06 Thread Michael Beccaria

I inadvertently purchase ABBYY Finereader 11 Corporate thinking that it would be capable of outputting to ALTO XML. I was wrong. ABBYY Finereader Engine does:/ Ultimately, I want to OCR some newspaper images and export them to ALTO XML and, until the proof of concept is done, I want to try to

Re: [CODE4LIB] OCR To ALTO without ABBYY

2012-09-06 Thread Bridger Dyson-Smith

You might take a look at Tesseract [1]. On a typical Linux box: $ tesseract input.tif outputName hocr renders html with some coordinate information. You might be able to process from that output to ALTO. Cheers, Bridger [1] http://code.google.com/p/tesseract-ocr/ On Thu, Sep 6, 2012 at 8:29