Good to know. I consulted the website of ABBYY and it say one option is an "Open license for local use on workstations", but I guess it's not a FLOSS license, unfortunately.

By the way, what is the state of the affair regarding Indic languages?

Do we have a central page documenting existing OCR pipeline used by the wikisource community?

What should I say to a contributor which come to me asking "I have this old PD book in my personnal library that I would like to digitalize, share and proofread in Wikisource, where should I start?". Do we have an online service, for example on tool labs, which enable to either upload or simply input url of a facsimile and that launch the OCR for example backed on tesseract?

Shouldn't we update our roadmap[1], or is there a more up to date document elsewhere?

[1] https://meta.wikimedia.org/wiki/Wikisource_roadmap


Le 13/04/2018 à 08:28, Nahum Wengrov a écrit :
I use ABBYY Finereader, don't remember the exact version (probably 12 or 11). I bought it a few years ago and it works perfectly for my language (Hebrew).

On Fri, Apr 13, 2018 at 2:22 AM, mathieu stumpf guntz <[email protected] <mailto:[email protected]>> wrote:

    Thank you Nahum,

    Could you indicate which OCR solution you are using?


    Le 26/03/2018 à 17:27, Nahum Wengrov a écrit :
    I frequently work offline on he.wikisource. I download the entire
    pdf file from commons to my hard drive, and OCR the page I need
    myself. One can use the OCR of wikisource and download the text
    too, I guess, page by page. Then I proof the text in a Word
    document, open to the lower half of my screen, with the pdf open
    on the upper half of the screen, where I go to the page I need
    with acrobat reader, and scroll both windows down or up as needed.

    On Mon, Mar 26, 2018 at 11:21 AM, mathieu stumpf guntz
    <[email protected]
    <mailto:[email protected]>> wrote:

        Le 24/03/2018 à 16:22, billinghurst a écrit :
        Though that would defeat the purpose of online proofreading
        with account verification. Some of the true value of our
        online process is that contribution builds a level of trust
        and knowledge and that is reflected in both our patrolling
        and the allocation of autopatrolled status.
        How providing tools to make batch work offline would
        interfere in anyway with that? Once the work is done, it can
        be uploaded to Wikisource with whichever account the user want.

        Actually, to my mind, the main benefit of the online aspect
        is the peer to peer production model. Also there is no need
        of a central node carrying accounts to take into account the
        trust given to a particular contributor. There is digital
        signature technologies such as gpg for example. Having a
        central node with a web interface just makes things easier
        for most users, it doesn't improve the trustability of the
        environment. On the contrary, with a single point of failure,
        we actually rely on a weaker solution on this regard.

         Also how would you have access to templates, and components
        like that from off-line?
        Well, that just show how innefecient are this tools to
        continue to contribute while being offline. It's allways
        possible to install Mediawiki and download required
        templates, but currently this process seems way to
        complicated, doesn't it.


        Also we generally cannot download the images separately as
        that is usually part of the later clean-up where people have
        the technical skills.
        I'm afraid the term "image" misguided your answer. It's seems
        you interpreted that as picture elements from files, while I
        was talking about this files themselves.

        So yes, there is the capacity to have the text and proofread
        the text, that actual checking the text against the image is
        not the sole component of proofreading, and further it would
        not be at all helpful for validation.
        There is nothing magic about working directly in a browser.
        People do download and upload all the required material
        anyway, but on a page per page base. The result is just as
        valid as it is done when transactions are operated on a file
        repository level.

        Cheers

        _______________________________________________
        Wikisource-l mailing list
        [email protected]
        <mailto:[email protected]>
        https://lists.wikimedia.org/mailman/listinfo/wikisource-l
        <https://lists.wikimedia.org/mailman/listinfo/wikisource-l>




    _______________________________________________
    Wikisource-l mailing list
    [email protected]
    <mailto:[email protected]>
    https://lists.wikimedia.org/mailman/listinfo/wikisource-l
    <https://lists.wikimedia.org/mailman/listinfo/wikisource-l>



_______________________________________________
Wikisource-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Reply via email to