[ol-tech] How are 'abbyy' files generated? (was Re: Epubs with missing pages)

Jon Leech Fri, 06 Nov 2015 01:02:44 -0800

On Thu, Nov 05, 2015 at 10:16:04PM -0500, Tom Morris wrote:
> I've got a fix in hand and will generate a pull request as soon as I have
> some test data to test with.


    It looks like the 'epub' project requires 'abbyy' OCR output as a
starting point. Is the toolchain for going from raw scans to abbyy also
available, so we might be able to generate our own individual test
datasets from our own books? I skimmed over all the other github
internetarchive projects, but it wasn't apparent which, if any of them
handles the scan->abbyy steps of the pipeline.
    Jon
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
Archives: http://www.mail-archive.com/[email protected]/
To unsubscribe from this mailing list, send email to 
[email protected]

[ol-tech] How are 'abbyy' files generated? (was Re: Epubs with missing pages)

Reply via email to