https://bugzilla.wikimedia.org/show_bug.cgi?id=42466
--- Comment #6 from [email protected] 2012-12-01 03:08:56 UTC --- (In reply to comment #5) > Well, not to minimize this bug but that's not true, it's only that they won't > be able to rely on the text layer. Frequently, the layer is such crap, > especially on older texts, that this has no practical effect. Furthermore, we > have our own OCR tool... The OCR text layer that comes along with a file uploaded from a source such as the Internet Archive is far superior to what gets generated by our OCR tool. If we have to rely on our own OCR tool, it will greatly increase the work that has to be done cleaning up problems generated by the OCR. Our own tool is prone to far more stupid mistakes. And we don't do many of the "older texts" (17th century and earlier), at least not on the English Wikisource. The community also feels (has expressly stated and agrees) that we should not work on newly uploaded texts until the bug is corrected, because we can't judge the accuracy of a match against the text layer, nor can we spot problems in a page to text match for edited files. As one member put it: "[Our OCR tool is] only intended for one-off use when a single page is missing text or has a very poor text-layer and not for every page in a work that already has a text-layer." And another: "As history as taught us many times before - if you work a file while in a state of error, you might not like the result of your misplaced efforts once the error is resolved." So, whatever you might think about it, the problem is choking off community work. The relevant discussion thread is in Wikisource's Scriptorium under the headers "Anyone having trouble pulling text layers?" and "Index text pages". -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
