Lars Aronsson, 12/03/19 22:27:
Is spell checking (and a normal dictionary) the only useful tool?

I'm not sure it's the only or most useful, but it's definitely common.

Would you count the number of spelling errors, or the ratio
of errors to correct words? Has anyone done this?

It's routinely done by OCR software, and in fact if I remember correctly such information is stored in DjVu files (the uncertain words are marked).

In practice, such information is most useful when preparing the files for upload to Wikisource, because you can minimise your manual work by checking that the OCR was mostly successful and if not try with different settings. I thought we had such information in <https://en.wikisource.org/wiki/Category:File_creation_help> but seems not.

We discussed this in the long past but I don't remember when. <https://strategy.wikimedia.org/wiki/Proposal:Make_Wikisource_scale> reminds me that <https://www.nla.gov.au/australian-newspaper-plan> had an impressive crowdsourcing a decade ago, but I don't see whether it died.

OCR assessment is a well-researched issue so you'll find things like <https://www.digitisation.eu/glossary/ground-truth/> but not so much about how to organise a transcription project: <http://succeed-project.eu/wiki/index.php/TPDL_Tutorial_State-of-the-art_tools_for_text_digitisation#2._OCR_and_Post-correction> claims that VTL was the most promising method back then, but in 2015 we found it was over in 2014.
<https://lists.wikimedia.org/pipermail/wikisource-l/2015-October/002516.html>

Federico

_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Reply via email to