On Sat, Jul 11, 2015 at 9:59 AM, Andrea Zanni <[email protected]>
wrote:
> uh, that sounds very interesting.
> Right now, we mainly use OCR from djvu from Internet Archive (that means
> ABBYY Finereader, which is very nice).
>
Yes, the output is generally good. But as far as I can tell, the archive's
Open Library API does not offer a way to retrieve the OCR output
programmatically, and certainly not for an arbitrary page rather than the
whole item. What I'm working on requires the ability to OCR a single page
on demand.
But ideally we could think of a "customizable" OCR software that gets
> trained language per language: htat would be extremely useful for
> Wiikisources.
>
> (i can also imagine to divide, inside every language, per centuries,
> because languages too changes over time ;-)
>
Indeed.
A.
--
Asaf Bartov
Wikimedia Foundation <http://www.wikimediafoundation.org>
Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us make it a reality!
https://donate.wikimedia.org
_______________________________________________
Wikisource-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikisource-l