Yeah, it's exactly like a "manual match-and-split" (or at least, I'm hoping it can be).

So yes, the first step is to make sure that the WD item has the two properties: one for PG ID, and one for Wikisource Index Page. Then the tool will show a link to 'transfer' the PG book to WS.

The interface has the full PG text, that you manually select the current page from. Click the button to transfer this to the WS text-box, clean it up a bit (adding links, templates, etc.), and then save it to WS.

I'm making a little screencast of how it works; will send the link for that to this list soon.


On 14/10/16 07:52, Alex Brollo wrote:
Back to the tool, is there some more doc to understand - step by step - how to run it? I imagine, that there's the need of a Gutemberg text and of a wikisource Index page coming from the same edition used by Gutemberg text; then the tool allows something like a "manual match and split". But perhaps I didn't understand anything.... I need to see the tool at work to understand it! :-(

At its beginning, it.source uploaded many books from an Italian project, LiberLiber, somehow similar to Project Gutemberg, and we often convert those ns0-only texts into proofread ones by various tricks; so I'd like to learn anything from Sam's tool.


2016-10-14 12:55 GMT+02:00 Anika Born < <>>:

    Hy Alex,

    My comment was not about spending some time on a PG-Projekt or not
    spending any time at all.

    The point/question (when it comes to de-WS) is a different one:

    (A) to spend some of our valuable contributions into a project
    that already is freely available (in another format) or spend this
    time in a (related) project that is NOT already freely available?
    (and we do have a lot of them)

        // note, it is not about not spending any time in proofreading
        or the Wikisourceproject... it is about finding valuable
        projects/texts to invest our time...

    + (B) to spend this time in a project, that may cost us the
    findability of the whole wikisource-project (and all other texts
    on wikisource) because Google/Bing/others do tag us as
    fork/reuser/copy of ... (as happened in the past, at least with
    de, when we had some texts of the commercial that is also supported by ABBY with a
    free softwarelizense)


    2016-10-14 10:13 GMT+02:00 Alex Brollo <

        I'm too very interested both into the idea and into its
        technical implementation, but I need some more doc for dummies
        to understand it fully :-(

        About importing into wikisource texts alreary proofread: a
        text into wikisource is different from a similar text into
        another web site, since it is "a node into wiki network", and
        this goal deserves IMHO some pain to proofread (and re-format)
         it again, adding lots of wiki cross links.


        2016-10-14 8:27 GMT+02:00 Andrea Zanni
        < <>>:

            I think the idea is good,
            but I would like to try that in my wikisource:
            could you manage to take also the few italian books that
            PG has?

            On Fri, Oct 14, 2016 at 8:23 AM, Anika Born
            < <>>

                corr1: [...] does not ha*ve*/show the scans, [...]


                2016-10-14 8:18 GMT+02:00 Anika Born
                < <>>:

                    Hy Sam,

                    would be good, cause PG does not hat/show the scans,


                    as I remember there was/is a policy at
                    <> to not use texts from other
                    projects (say: if there is text A in PG, there
                    won't be a similar text A in de.WS),

                    cause at the time de.WS did use PG-texts... Google
                    said WS is a mirror of PG and all other (not
                    PG)-texts were left out in Google-Search-Results
                    as well....  The (small) visibility of WS got lost
                    completely... That is the reason, why there are no
                    new projects on de-WS about texts that are
                    available in a (nearly) similar project

                    (besides the effort: why spending so much time on
                    a text that already is avilable? - you'd have to
                    proofread ist at least two times)

                    But that is this special German-thing.....

                    What do the others think about it?

                    2016-10-14 3:20 GMT+02:00 Sam Wilson
                    < <>>:

                        Hi all,

                        I've been tinkering with an idea I've had for
                        importing Project Gutenberg books into

                        The idea is that, if Wikidata makes a link
                        between a PG ID number and a Wikisource Index
                        page, then we can go through that Index page
                        one page at a time, and copy the page's text
                        from the PG book to the WS page.

                        The interface so far isn't very brilliant, but
                        I'm just trying to figure out if this is
                        worthwhile or not. Basically, it's a matter of
                        selecting the right chunk of text in the
                        right-most text box (the full PG text) and
                        hitting the button to move it left into the
                        centre box. Then cleaning it up (manually and
                        with the magic cleaning button) to make it
                        match the image, and then uploading it to

                        It's a bad tool though, because it doesn't
                        handle the running header, and the copy-across
                        button doesn't do nice things with {{hws}}
                        etc. — not to mention all the other things it
                        doesn't do.

                        Anyway, just thought I'd mention it. :-)
                        Anyone think this is an avenue worth
                        exploring? Certainly I'd love to be able to
                        say we've got everything PG has /and more/!


                        PS changes made by this tool are all tagged as
                        "OAuth CID: 638" —


                        Wikisource-l mailing list

                Wikisource-l mailing list

            Wikisource-l mailing list

        Wikisource-l mailing list

    Wikisource-l mailing list

Wikisource-l mailing list

Wikisource-l mailing list

Reply via email to