Re: [Wikisource-l] OCR as a service?

Asaf Bartov Sun, 12 Jul 2015 02:26:16 -0700

On Sat, Jul 11, 2015 at 9:59 AM, Andrea Zanni <[email protected]>
wrote:


> uh, that sounds very interesting.
> Right now, we mainly use OCR from djvu from Internet Archive (that means
> ABBYY Finereader, which is very nice).
>

Yes, the output is generally good.  But as far as I can tell, the archive's
Open Library API does not offer a way to retrieve the OCR output
programmatically, and certainly not for an arbitrary page rather than the
whole item.  What I'm working on requires the ability to OCR a single page
on demand.

But ideally we could think of a "customizable" OCR software that gets
> trained language per language: htat would be extremely useful for
> Wiikisources.
>
> (i can also imagine to divide, inside every language, per centuries,
> because languages too changes over time ;-)
>

Indeed.

   A.
-- 
    Asaf Bartov
    Wikimedia Foundation <http://www.wikimediafoundation.org>

Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us make it a reality!
https://donate.wikimedia.org

_______________________________________________
Wikisource-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Re: [Wikisource-l] OCR as a service?

Reply via email to