Re: [Wikitech-l] Fwd: wikicaptcha on GitHub

2012-02-02 Thread Nikola Smolenski
On 25/01/12 10:23, Federico Leva (Nemo) wrote: OCRs generally work by finding lines of text on a page, splitting the lines into letters, then recognizing each letter separately. So, an OCR would know, for each letter of the recognized text, what is its bounding box on the page.

[Wikitech-l] Fwd: wikicaptcha on GitHub

2012-01-25 Thread Federico Leva (Nemo)
We sort of use IA's data already, because many Wikisource texts are OCR'ed on IA. If we manage to use OCR improvements within DjVu, it shouldn't be too difficult to reupload such DjVu in their items and then they could do what they want with them. OCRs generally work by finding lines of text

Re: [Wikitech-l] Fwd: wikicaptcha on GitHub

2012-01-19 Thread Cristian Consonni
2012/1/15 Nikola Smolenski smole...@eunet.rs: Дана Wednesday 11 January 2012 18:19:14 Cristian Consonni написа: However, to my knowledge there is not a single OCR that exports this data, nor is there a standard format for it. If an open source OCR could be modified to do this, then it would be

Re: [Wikitech-l] Fwd: wikicaptcha on GitHub

2012-01-19 Thread Tei
On 19 January 2012 11:19, Cristian Consonni kikkocrist...@gmail.com wrote: 2012/1/15 Nikola Smolenski smole...@eunet.rs: Дана Wednesday 11 January 2012 18:19:14 Cristian Consonni написа: However, to my knowledge there is not a single OCR that exports this data, nor is there a standard format

Re: [Wikitech-l] Fwd: wikicaptcha on GitHub

2012-01-14 Thread Nikola Smolenski
Дана Wednesday 11 January 2012 18:19:14 Cristian Consonni написа: I believe the trickiest part is creating a system to put results back in Wikisource in a semi-automated way, but having captcha reviewers may help. OCRs generally work by finding lines of text on a page, splitting the lines

[Wikitech-l] Fwd: wikicaptcha on GitHub

2012-01-11 Thread Cristian Consonni
2012/1/6 Platonides platoni...@gmail.com: Integrating this into ConfirmEdit extension shouldn't be hard. It's the extra features what makes this tricky. This system is interesting for gathering translations, but doesn't work for verifying that the answer is right. How would you verify that?

Re: [Wikitech-l] Fwd: wikicaptcha on GitHub

2012-01-11 Thread David Gerard
On 11 January 2012 17:19, Cristian Consonni kikkocrist...@gmail.com wrote: We could also decorate our captcha with  this captcha helps transcribing BOOK TITLE + link. Hah, use it for editor recruitment! - d. ___ Wikitech-l mailing list

Re: [Wikitech-l] Fwd: wikicaptcha on GitHub

2012-01-11 Thread Cristian Consonni
2012/1/11 David Gerard dger...@gmail.com: On 11 January 2012 17:19, Cristian Consonni kikkocrist...@gmail.com wrote: We could also decorate our captcha with  this captcha helps transcribing BOOK TITLE + link. Hah, use it for editor recruitment! That was the point, indeed. Cristian

Re: [Wikitech-l] Fwd: wikicaptcha on GitHub

2012-01-11 Thread Trevor Parscal
I spoke to some people at the Internet Archive about the ReCaptcha situation, and learned something interesting. Apparently, although IA provided a large dataset to ReCaptcha, they never got any data back, and then after the Google acquisition, they got shut out completely. I highly recommend we

Re: [Wikitech-l] Fwd: wikicaptcha on GitHub

2012-01-11 Thread David Gerard
On 11 January 2012 19:03, Trevor Parscal tpars...@wikimedia.org wrote: Apparently, although IA provided a large dataset to ReCaptcha, they never got any data back, and then after the Google acquisition, they got shut out completely. I highly recommend we get IA involved if at all possible -

Re: [Wikitech-l] Fwd: wikicaptcha on GitHub

2012-01-11 Thread Trevor Parscal
We have a lawyer that can help determine that. It's not obvious to me (or you apparently) so I guess we should get one involved. - Trevor On Wed, Jan 11, 2012 at 11:07 AM, David Gerard dger...@gmail.com wrote: On 11 January 2012 19:03, Trevor Parscal tpars...@wikimedia.org wrote:

Re: [Wikitech-l] Fwd: wikicaptcha on GitHub

2012-01-11 Thread Gregory Varnum
My amateur inquiry ( http://www.uspto.gov/patents/process/search/index.jsp ) found this: http://1.usa.gov/xCrBvq I imagine Geoff will have a much clearer idea of it this applies and how Google treats them. :) -greg aka varnent On Jan 11, 2012, at 2:08 PM, Trevor Parscal wrote: We have a

Re: [Wikitech-l] Fwd: wikicaptcha on GitHub

2012-01-11 Thread Mark A. Hershberger
Trevor Parscal tpars...@wikimedia.org writes: Apparently, although IA provided a large dataset to ReCaptcha, they never got any data back, and then after the Google acquisition, they got shut out completely. This is why I registered FreeCaptcha.net. I read how people's effort was