On 01/05/2012 02:19 PM, Laurence Penney wrote:
> With a human proofreading UI, it seems essential to be able to “approve” 
> pages even if no errors are found, to dissuade future OCR from making changes.

In my little Scandinavian e-text archive, this graph shows what
we did in 2011, http://runeberg.org/admin/klart-2011.svg

The green line at the bottom shows 22,000 scanned pages were
proofread. That's 60 pages per day, at an even pace. The blue
line is pages we scanned, which were only 10,000 in the first 5
months and then went to 65,000 pages by the end of the year.
That is a typical year. Every year, we scan and OCR some 40,000
pages more than we have volunteers to proofread. The Internet
Archive scans and OCRs far more than this, but proofreading
takes so much longer per page and the number of volunteers
is limited. So there will be no shortage of untouched old OCR,
that can be replaced by improved OCR some years later.

In my project, since we have always run OCR semi-manually with
proofreading in mind, I don't think we need to redo our 5 year old
OCR. But perhaps some that we did 10 years ago. Some books
printed in blackletter/Fraktur could need a new OCR though, if
some good software comes around.


-- 
   Lars Aronsson ([email protected])
   Aronsson Datateknik - http://aronsson.se

   Project Runeberg - free Nordic literature - http://runeberg.org/


_______________________________________________
Ol-discuss mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to