On Tue, Jan 3, 2012 at 5:49 PM, Laurence Penney <[email protected]> wrote:
> I see in the Thoreau that there are numerous cases where ‘ll’ is mistaken for > ‘U’. It would be splendid if, after just a few of these were fixed manually, > something could suggest performing numerous other replacements — particularly > cases where ‘ll’ was already a candidate for the OCR of that word-part. Is > this something that Abbyy can be induced to do? > The best place to incorporate this feedback is the training process for the recognition engine, so that it can use all the other information that it has available at that point to improve the recognition process. Is there a description of the scanning, image processing, recognition, and text post-processing pipeline anywhere? It was described as open source at introduction, but the referenced source repository (http://sourceforge.net/projects/scribesw/) hasn't been touched in 5+ years, so it seems pretty unlikely that it represents the software which is actually in use. There's a blog post discussing correction of IA texts from last summer: http://iphylo.blogspot.com/2011/07/correcting-ocr-using-hocr-firefox.html The Firefox plugin could be used directly if the files were stored in hOCR format instead of ABBYY's proprietary XML, but it's a straightforward conversion process. Tom _______________________________________________ Ol-discuss mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to [email protected]
