Re: [Wikisource-l] Importing books from Project Gutenberg

Sam Wilson Sat, 15 Oct 2016 16:21:07 -0700

That's a really good point Anika, I'd not considered that having PGbooks could be detrimental to Wikisource! :-(

I guess the reverse could also be true? That Google might think that PGis a mirror of WS, and decrease PG's page-rank. Either way, not great.

How can I investigate whether this is occuring? How did you figure itout for de.ws?

As for replicating the effort: I figure that if there are peopleinterested in doing it, then why not! :-) Personally, I want to makeWikisource the best digital library it can be, and when I show it topeople and they say "oh but you haven't got all of Dickens" orsomething, then I want to fix that. And it seems that importing otherexisting (free and open) digital libraries can help with this in aquicker fashion than straight-up proofreading. But I totally can see whypeople wouldn't want to spend time doing it! And that's cool.


:-)

—Sam


On 14/10/16 03:55, Anika Born wrote:

Hy Alex,

My comment was not about spending some time on a PG-Projekt or notspending any time at all.


The point/question (when it comes to de-WS) is a different one:

(A) to spend some of our valuable contributions into a project thatalready is freely available (in another format) or spend this time ina (related) project that is NOT already freely available? (and we dohave a lot of them)


    // note, it is not about not spending any time in proofreading or
    the Wikisourceproject... it is about finding valuable
    projects/texts to invest our time...

+ (B) to spend this time in a project, that may cost us thefindability of the whole wikisource-project (and all other texts onwikisource) because Google/Bing/others do tag us as fork/reuser/copyof ... (as happened in the past, at least with de, when we had sometexts of the commercial http://gutenberg.spiegel.de/ that is alsosupported by ABBY with a free softwarelizense)



Anika

2016-10-14 10:13 GMT+02:00 Alex Brollo <[email protected]<mailto:[email protected]>>:


    I'm too very interested both into the idea and into its technical
    implementation, but I need some more doc for dummies to understand
    it fully :-(

    About importing into wikisource texts alreary proofread: a text
    into wikisource is different from a similar text into another web
    site, since it is "a node into wiki network", and this goal
    deserves IMHO some pain to proofread (and re-format)  it again,
    adding lots of wiki cross links.

    Alex


    2016-10-14 8:27 GMT+02:00 Andrea Zanni <[email protected]
    <mailto:[email protected]>>:

        I think the idea is good,
        but I would like to try that in my wikisource:
        could you manage to take also the few italian books that PG has?
        Thanks!

        On Fri, Oct 14, 2016 at 8:23 AM, Anika Born
        <[email protected] <mailto:[email protected]>> wrote:

            corr1: [...] does not ha*ve*/show the scans, [...]

            Anika

            2016-10-14 8:18 GMT+02:00 Anika Born
            <[email protected] <mailto:[email protected]>>:

                Hy Sam,

                would be good, cause PG does not hat/show the scans,

                But

                as I remember there was/is a policy at de.ws
                <http://de.ws> to not use texts from other projects
                (say: if there is text A in PG, there won't be a
                similar text A in de.WS),

                cause at the time de.WS did use PG-texts... Google
                said WS is a mirror of PG and all other (not PG)-texts

were left out in Google-Search-Results as well....The (small) visibility of WS got lost completely...

                That is the reason, why there are no new projects on
                de-WS about texts that are available in a (nearly)
                similar project

                (besides the effort: why spending so much time on a
                text that already is avilable? - you'd have to
                proofread ist at least two times)


                But that is this special German-thing.....


                What do the others think about it?
                Anika

                2016-10-14 3:20 GMT+02:00 Sam Wilson
                <[email protected] <mailto:[email protected]>>:

                    Hi all,

                    I've been tinkering with an idea I've had for
                    importing Project Gutenberg books into Wikisource:
                    http://tools.wmflabs.org/pg2ws/
                    <http://tools.wmflabs.org/pg2ws/>

                    The idea is that, if Wikidata makes a link between
                    a PG ID number and a Wikisource Index page, then
                    we can go through that Index page one page at a
                    time, and copy the page's text from the PG book to
                    the WS page.

                    The interface so far isn't very brilliant, but I'm
                    just trying to figure out if this is worthwhile or
                    not. Basically, it's a matter of selecting the
                    right chunk of text in the right-most text box
                    (the full PG text) and hitting the button to move
                    it left into the centre box. Then cleaning it up
                    (manually and with the magic cleaning button) to
                    make it match the image, and then uploading it to
                    Wikisource.

                    It's a bad tool though, because it doesn't handle
                    the running header, and the copy-across button
                    doesn't do nice things with {{hws}} etc. — not to
                    mention all the other things it doesn't do.

                    Anyway, just thought I'd mention it. :-) Anyone
                    think this is an avenue worth exploring? Certainly
                    I'd love to be able to say we've got everything PG
                    has /and more/!

                    —Sam

                    PS changes made by this tool are all tagged as
                    "OAuth CID: 638" —

                    
https://en.wikisource.org/w/index.php?title=Special:RecentChanges&tagfilter=OAuth+CID%3A+638
                    
<https://en.wikisource.org/w/index.php?title=Special:RecentChanges&tagfilter=OAuth+CID%3A+638>


                    _______________________________________________
                    Wikisource-l mailing list
                    [email protected]
                    <mailto:[email protected]>
                    https://lists.wikimedia.org/mailman/listinfo/wikisource-l
                    <https://lists.wikimedia.org/mailman/listinfo/wikisource-l>




            _______________________________________________
            Wikisource-l mailing list
            [email protected]
            <mailto:[email protected]>
            https://lists.wikimedia.org/mailman/listinfo/wikisource-l
            <https://lists.wikimedia.org/mailman/listinfo/wikisource-l>



        _______________________________________________
        Wikisource-l mailing list
        [email protected]
        <mailto:[email protected]>
        https://lists.wikimedia.org/mailman/listinfo/wikisource-l
        <https://lists.wikimedia.org/mailman/listinfo/wikisource-l>



    _______________________________________________
    Wikisource-l mailing list
    [email protected]
    <mailto:[email protected]>
    https://lists.wikimedia.org/mailman/listinfo/wikisource-l
    <https://lists.wikimedia.org/mailman/listinfo/wikisource-l>




_______________________________________________
Wikisource-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikisource-l

_______________________________________________
Wikisource-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Re: [Wikisource-l] Importing books from Project Gutenberg

Reply via email to