https://bugzilla.wikimedia.org/show_bug.cgi?id=42466

--- Comment #6 from [email protected] 2012-12-01 03:08:56 UTC ---
(In reply to comment #5)
> Well, not to minimize this bug but that's not true, it's only that they won't
> be able to rely on the text layer.  Frequently, the layer is such crap,
> especially on older texts, that this has no practical effect.  Furthermore, we
> have our own OCR tool...

The OCR text layer that comes along with a file uploaded from a source such 
as the Internet Archive is far superior to what gets generated by our OCR tool. 
If we have to rely on our own OCR tool, it will greatly increase the work that 
has to be done cleaning up problems generated by the OCR. Our own tool is 
prone to far more stupid mistakes.

And we don't do many of the "older texts" (17th century and earlier), at least 
not on the English Wikisource.

The community also feels (has expressly stated and agrees) that we should not 
work on newly uploaded texts until the bug is corrected, because we can't judge 
the accuracy of a match against the text layer, nor can we spot problems in a 
page to text match for edited files.  As one member put it: 

"[Our OCR tool is] only intended for one-off use when a single page is missing 
text or has a very poor text-layer and not for every page in a work that
already 
has a text-layer."

And another:

"As history as taught us many times before - if you work a file while in a
state 
of error, you might not like the result of your misplaced efforts once the
error is resolved."

So, whatever you might think about it, the problem is choking off community
work.
The relevant discussion thread is in Wikisource's Scriptorium under the headers 
"Anyone having trouble pulling text layers?" and "Index text pages".

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to