Re: [Wikisource-l] OCR as a service?

2015-07-29 Thread Jane Darnell
Nice! I will wait for the client though, thx. Where will the source images be stored? Labs or Commons? It would be nice if you could somehow make a client that builds a djvu file locally with the page image and the OCR text that you can cleanup before putting it into the djvu file. Now it just

Re: [Wikisource-l] OCR as a service?

2015-07-29 Thread Asaf Bartov
Hello again. So, I've set up an OpenOCR instance on Labs that's available for use as a service. Just call it and point to an image. Example: *curl -X POST -H Content-Type: application/json -d '{img_url:http://bit.ly/ocrimage http://bit.ly/ocrimage,engine:tesseract}'

Re: [Wikisource-l] OCR as a service?

2015-07-12 Thread Alex Brollo
I explored abbyy gx files, the full xml output from ABBYY ocr engine running at Internet Archive, and I've been astonished by the amount of data they contain - they are stored at XCA_Extended detaiI (as documented at http://www.abbyy-developers.com/en:tech:features:xml ). Something that

Re: [Wikisource-l] OCR as a service?

2015-07-12 Thread Asaf Bartov
On Sat, Jul 11, 2015 at 8:44 AM, Nicolas VIGNERON vigneron.nico...@gmail.com wrote: Hi, I'm not a techie so I'm not sure to know what is OCR-as-service but you should ask Tpt and Phe who have OCR stuff on the tool labs (to know what is behind tools like

Re: [Wikisource-l] OCR as a service?

2015-07-12 Thread Asaf Bartov
On Sat, Jul 11, 2015 at 9:59 AM, Andrea Zanni zanni.andre...@gmail.com wrote: uh, that sounds very interesting. Right now, we mainly use OCR from djvu from Internet Archive (that means ABBYY Finereader, which is very nice). Yes, the output is generally good. But as far as I can tell, the

Re: [Wikisource-l] OCR as a service?

2015-07-12 Thread billinghurst
OCR is available by a javascript. Numbers of wikisources have it enabled as a gadget, though I cannot speak for all the wikis. I presume it relates to the languages available in the OCR. Script is noted at https://wikisource.org/wiki/Wikisource:Shared_Scripts Regards, Billinghurst On Sun, Jul

Re: [Wikisource-l] OCR as a service?

2015-07-11 Thread Alex Brollo
Very, very interesting I can't help you, my skill is very limited, but I'm very interested about and I hope that my interest will be largely shared. Alex 2015-07-11 12:04 GMT+02:00 Asaf Bartov abar...@wikimedia.org: Hi. Speaking of Wikisource software, do we already have any instance

Re: [Wikisource-l] OCR as a service?

2015-07-11 Thread Andrea Zanni
uh, that sounds very interesting. Right now, we mainly use OCR from djvu from Internet Archive (that means ABBYY Finereader, which is very nice). But ideally we could think of a customizable OCR software that gets trained language per language: htat would be extremely useful for Wiikisources. (i