2011/2/5 River Tarnell <r.tarn...@ieee.org>

> In article <AANLkTikWLU5Y8C2UokYRN=v1-zwhb1kthnxi4xtbm...@mail.gmail.com>,
> David Gerard  <dger...@gmail.com> wrote:
> >On 5 February 2011 15:12, Alex Brollo <alex.bro...@gmail.com> wrote:
> >> Just to let you know that Aubrey just prestented it.source idea for
> >> wikicaptcha into wikisource-l
> >What would it take to get this into place? What's the captcha load on
> >WMF sites? Would e.g. the toolserver melt under the load? Perhaps on
> >one project at a time?
>
> I don't think this should be hosted on the Toolserver; as CAPTCHAs are a
> core part of the site, they should not rely on the TS to work.
>
>        - river.
>
>
IMHO, it could be an opportunity to think again to the role of Commons as a
central library. I imagine something like this:

1. as soon as a djvu file with a text layer is uploaded, a complete set of
pages text layers is extracted, saving words coordinates too;
2. such text layers could be browsed by a script, extracting all words
marked as doubtful (usually with a ^ characters), but extracting too words
which don't match with a good dictionary;
3. a dynamic recaptcha database is updated and word images are submitted to
wiki contributors, both as a formal captcha for unlogged user edits, and as
a volunteer job to help wikisource projects; updates will fix text files;
4. a tool should be build, to upload "pure text" from such text files into
any wikisource project;
5. finally refined text could be re-uploaded into djvu file, so converting
it into a "djvu file with a wiki text layer".

Alex




4.
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to