On Sat, Jul 11, 2015 at 9:59 AM, Andrea Zanni <[email protected]>
wrote:

> uh, that sounds very interesting.
> Right now, we mainly use OCR from djvu from Internet Archive (that means
> ABBYY Finereader, which is very nice).
>

Yes, the output is generally good.  But as far as I can tell, the archive's
Open Library API does not offer a way to retrieve the OCR output
programmatically, and certainly not for an arbitrary page rather than the
whole item.  What I'm working on requires the ability to OCR a single page
on demand.

But ideally we could think of a "customizable" OCR software that gets
> trained language per language: htat would be extremely useful for
> Wiikisources.
>
> (i can also imagine to divide, inside every language, per centuries,
> because languages too changes over time ;-)
>

Indeed.

   A.
-- 
    Asaf Bartov
    Wikimedia Foundation <http://www.wikimediafoundation.org>

Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us make it a reality!
https://donate.wikimedia.org
_______________________________________________
Wikisource-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Reply via email to