On 25/01/12 10:23, Federico Leva (Nemo) wrote:
OCRs generally work by finding lines of text on a page, splitting the
lines
into letters, then recognizing each letter separately. So, an OCR would
know,
for each letter of the recognized text, what is its bounding box on
the page.
We sort of use IA's data already, because many Wikisource texts are
OCR'ed on IA. If we manage to use OCR improvements within DjVu, it
shouldn't be too difficult to reupload such DjVu in their items and then
they could do what they want with them.
OCRs generally work by finding lines of text
2012/1/15 Nikola Smolenski smole...@eunet.rs:
Дана Wednesday 11 January 2012 18:19:14 Cristian Consonni написа:
However, to my knowledge there is not a single OCR that exports this data, nor
is there a standard format for it. If an open source OCR could be modified to
do this, then it would be
On 19 January 2012 11:19, Cristian Consonni kikkocrist...@gmail.com wrote:
2012/1/15 Nikola Smolenski smole...@eunet.rs:
Дана Wednesday 11 January 2012 18:19:14 Cristian Consonni написа:
However, to my knowledge there is not a single OCR that exports this data,
nor
is there a standard format
Дана Wednesday 11 January 2012 18:19:14 Cristian Consonni написа:
I believe the trickiest part is creating a system to put results back
in Wikisource in a semi-automated way, but having captcha reviewers
may help.
OCRs generally work by finding lines of text on a page, splitting the lines
2012/1/6 Platonides platoni...@gmail.com:
Integrating this into ConfirmEdit extension shouldn't be hard. It's the
extra features what makes this tricky. This system is interesting for
gathering translations, but doesn't work for verifying that the answer
is right. How would you verify that?
On 11 January 2012 17:19, Cristian Consonni kikkocrist...@gmail.com wrote:
We could also decorate our captcha with this captcha helps
transcribing BOOK TITLE + link.
Hah, use it for editor recruitment!
- d.
___
Wikitech-l mailing list
2012/1/11 David Gerard dger...@gmail.com:
On 11 January 2012 17:19, Cristian Consonni kikkocrist...@gmail.com wrote:
We could also decorate our captcha with this captcha helps
transcribing BOOK TITLE + link.
Hah, use it for editor recruitment!
That was the point, indeed.
Cristian
I spoke to some people at the Internet Archive about the ReCaptcha
situation, and learned something interesting.
Apparently, although IA provided a large dataset to ReCaptcha, they never
got any data back, and then after the Google acquisition, they got shut out
completely.
I highly recommend we
On 11 January 2012 19:03, Trevor Parscal tpars...@wikimedia.org wrote:
Apparently, although IA provided a large dataset to ReCaptcha, they never
got any data back, and then after the Google acquisition, they got shut out
completely.
I highly recommend we get IA involved if at all possible -
We have a lawyer that can help determine that. It's not obvious to me (or
you apparently) so I guess we should get one involved.
- Trevor
On Wed, Jan 11, 2012 at 11:07 AM, David Gerard dger...@gmail.com wrote:
On 11 January 2012 19:03, Trevor Parscal tpars...@wikimedia.org wrote:
My amateur inquiry ( http://www.uspto.gov/patents/process/search/index.jsp )
found this:
http://1.usa.gov/xCrBvq
I imagine Geoff will have a much clearer idea of it this applies and how Google
treats them. :)
-greg aka varnent
On Jan 11, 2012, at 2:08 PM, Trevor Parscal wrote:
We have a
Trevor Parscal tpars...@wikimedia.org writes:
Apparently, although IA provided a large dataset to ReCaptcha, they never
got any data back, and then after the Google acquisition, they got shut out
completely.
This is why I registered FreeCaptcha.net. I read how people's effort
was
13 matches
Mail list logo