2011/2/5 River Tarnell <r.tarn...@ieee.org> > In article <AANLkTikWLU5Y8C2UokYRN=v1-zwhb1kthnxi4xtbm...@mail.gmail.com>, > David Gerard <dger...@gmail.com> wrote: > >On 5 February 2011 15:12, Alex Brollo <alex.bro...@gmail.com> wrote: > >> Just to let you know that Aubrey just prestented it.source idea for > >> wikicaptcha into wikisource-l > >What would it take to get this into place? What's the captcha load on > >WMF sites? Would e.g. the toolserver melt under the load? Perhaps on > >one project at a time? > > I don't think this should be hosted on the Toolserver; as CAPTCHAs are a > core part of the site, they should not rely on the TS to work. > > - river. > > IMHO, it could be an opportunity to think again to the role of Commons as a central library. I imagine something like this:
1. as soon as a djvu file with a text layer is uploaded, a complete set of pages text layers is extracted, saving words coordinates too; 2. such text layers could be browsed by a script, extracting all words marked as doubtful (usually with a ^ characters), but extracting too words which don't match with a good dictionary; 3. a dynamic recaptcha database is updated and word images are submitted to wiki contributors, both as a formal captcha for unlogged user edits, and as a volunteer job to help wikisource projects; updates will fix text files; 4. a tool should be build, to upload "pure text" from such text files into any wikisource project; 5. finally refined text could be re-uploaded into djvu file, so converting it into a "djvu file with a wiki text layer". Alex 4. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l