On 5 Jan 2012, at 08:08, Ralf Stephan wrote: > On Jan 4, 2012, at 4:29 PM, Lars Aronsson wrote: >> One problem is if older scans were OCRed with older >> software and worse results. Should one go back and >> run a new OCR on these? Perpetually every 5 years? > > What's easier? Replace the OCR or write rules that only catch > the quirks of a specific OCR software+language+font combination? > Clearly the former, IMHO.
The 5-years-later OCR will not necessarily be better... and if any human proofreading has been done on the text, it would be wrong to override with the new OCR. With a human proofreading UI, it seems essential to be able to “approve” pages even if no errors are found, to dissuade future OCR from making changes. - L _______________________________________________ Ol-discuss mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to [email protected]
