Re: [Wikisource-l] Importing books from Project Gutenberg

2016-10-16 Thread Alex Brollo
Ok, I'll use https://www.wikidata.org/wiki/Q27245478 as an example and I'll
submit it to it.source WD specialists to see if we can retrieve, or add
data for a test work.

Alex



2016-10-16 1:28 GMT+02:00 Sam Wilson :

> Hm, it should work fine for it.ws too. Can you give me a WD item for a
> book with a PG ID and a it.ws Index page? I'll investigate further... :-)
>
> One cool thing that I've only recently found is this list of PG's sources:
> http://www.pgdp.net/c/tools/project_manager/show_image_sources.php (you
> need to log in)
>
> It's not very structured, but it's the only place I've found that links a
> PG ID to a scan on the Internet Archive or elsewhere. I'm thinking of
> writing a scraper to get the data so that it can at least link more PG IDs
> and IA identifiers on Wikidata.
>
> —Sam
>
> On 13/10/16 23:27, Andrea Zanni wrote:
>
> I think the idea is good,
> but I would like to try that in my wikisource:
> could you manage to take also the few italian books that PG has?
> Thanks!
>
> On Fri, Oct 14, 2016 at 8:23 AM, Anika Born 
> wrote:
>
>> corr1: [...] does not ha*ve*/show the scans, [...]
>>
>> Anika
>>
>> 2016-10-14 8:18 GMT+02:00 Anika Born :
>>
>>> Hy Sam,
>>>
>>> would be good, cause PG does not hat/show the scans,
>>>
>>> But
>>>
>>> as I remember there was/is a policy at de.ws to not use texts from
>>> other projects (say: if there is text A in PG, there won't be a similar
>>> text A in de.WS),
>>>
>>> cause at the time de.WS did use PG-texts... Google said WS is a mirror
>>> of PG and all other (not PG)-texts were left out in Google-Search-Results
>>> as well  The (small) visibility of WS got lost completely... That is
>>> the reason, why there are no new projects on de-WS about texts that are
>>> available in a (nearly) similar project
>>>
>>> (besides the effort: why spending so much time on a text that already is
>>> avilable? - you'd have to proofread ist at least two times)
>>>
>>>
>>> But that is this special German-thing.
>>>
>>>
>>> What do the others think about it?
>>> Anika
>>>
>>> 2016-10-14 3:20 GMT+02:00 Sam Wilson :
>>>
 Hi all,

 I've been tinkering with an idea I've had for importing Project
 Gutenberg books into Wikisource: http://tools.wmflabs.org/pg2ws/

 The idea is that, if Wikidata makes a link between a PG ID number and a
 Wikisource Index page, then we can go through that Index page one page at a
 time, and copy the page's text from the PG book to the WS page.

 The interface so far isn't very brilliant, but I'm just trying to
 figure out if this is worthwhile or not. Basically, it's a matter of
 selecting the right chunk of text in the right-most text box (the full PG
 text) and hitting the button to move it left into the centre box. Then
 cleaning it up (manually and with the magic cleaning button) to make it
 match the image, and then uploading it to Wikisource.

 It's a bad tool though, because it doesn't handle the running header,
 and the copy-across button doesn't do nice things with {{hws}} etc. — not
 to mention all the other things it doesn't do.

 Anyway, just thought I'd mention it. :-) Anyone think this is an avenue
 worth exploring? Certainly I'd love to be able to say we've got everything
 PG has *and more*!

 —Sam

 PS changes made by this tool are all tagged as "OAuth CID: 638" —

 https://en.wikisource.org/w/index.php?title=Special:RecentCh
 anges&tagfilter=OAuth+CID%3A+638

 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l


>>>
>>
>> ___
>> Wikisource-l mailing list
>> Wikisource-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>>
>>
>
>
> ___
> Wikisource-l mailing 
> listWikisource-l@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikisource-l
>
>
>
> ___
> Wikisource-l mailing list
> Wikisource-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>
>
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] Importing books from Project Gutenberg

2016-10-15 Thread Sam Wilson
Hm, it should work fine for it.ws too. Can you give me a WD item for a 
book with a PG ID and a it.ws Index page? I'll investigate further... :-)


One cool thing that I've only recently found is this list of PG's sources:
http://www.pgdp.net/c/tools/project_manager/show_image_sources.php (you 
need to log in)


It's not very structured, but it's the only place I've found that links 
a PG ID to a scan on the Internet Archive or elsewhere. I'm thinking of 
writing a scraper to get the data so that it can at least link more PG 
IDs and IA identifiers on Wikidata.


—Sam


On 13/10/16 23:27, Andrea Zanni wrote:

I think the idea is good,
but I would like to try that in my wikisource:
could you manage to take also the few italian books that PG has?
Thanks!

On Fri, Oct 14, 2016 at 8:23 AM, Anika Born > wrote:


corr1: [...] does not ha*ve*/show the scans, [...]

Anika

2016-10-14 8:18 GMT+02:00 Anika Born mailto:wikian...@wikipedia.de>>:

Hy Sam,

would be good, cause PG does not hat/show the scans,

But

as I remember there was/is a policy at de.ws  to
not use texts from other projects (say: if there is text A in
PG, there won't be a similar text A in de.WS),

cause at the time de.WS did use PG-texts... Google said WS is
a mirror of PG and all other (not PG)-texts were left out in
Google-Search-Results as well  The (small) visibility of
WS got lost completely... That is the reason, why there are no
new projects on de-WS about texts that are available in a
(nearly) similar project

(besides the effort: why spending so much time on a text that
already is avilable? - you'd have to proofread ist at least
two times)


But that is this special German-thing.


What do the others think about it?
Anika

2016-10-14 3:20 GMT+02:00 Sam Wilson mailto:s...@samwilson.id.au>>:

Hi all,

I've been tinkering with an idea I've had for importing
Project Gutenberg books into Wikisource:
http://tools.wmflabs.org/pg2ws/


The idea is that, if Wikidata makes a link between a PG ID
number and a Wikisource Index page, then we can go through
that Index page one page at a time, and copy the page's
text from the PG book to the WS page.

The interface so far isn't very brilliant, but I'm just
trying to figure out if this is worthwhile or not.
Basically, it's a matter of selecting the right chunk of
text in the right-most text box (the full PG text) and
hitting the button to move it left into the centre box.
Then cleaning it up (manually and with the magic cleaning
button) to make it match the image, and then uploading it
to Wikisource.

It's a bad tool though, because it doesn't handle the
running header, and the copy-across button doesn't do nice
things with {{hws}} etc. — not to mention all the other
things it doesn't do.

Anyway, just thought I'd mention it. :-) Anyone think this
is an avenue worth exploring? Certainly I'd love to be
able to say we've got everything PG has /and more/!

—Sam

PS changes made by this tool are all tagged as "OAuth CID:
638" —


https://en.wikisource.org/w/index.php?title=Special:RecentChanges&tagfilter=OAuth+CID%3A+638




___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org

https://lists.wikimedia.org/mailman/listinfo/wikisource-l





___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org

https://lists.wikimedia.org/mailman/listinfo/wikisource-l





___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] Importing books from Project Gutenberg

2016-10-15 Thread Sam Wilson
Yeah, it's exactly like a "manual match-and-split" (or at least, I'm 
hoping it can be).


So yes, the first step is to make sure that the WD item has the two 
properties: one for PG ID, and one for Wikisource Index Page. Then the 
tool will show a link to 'transfer' the PG book to WS.


The interface has the full PG text, that you manually select the current 
page from. Click the button to transfer this to the WS text-box, clean 
it up a bit (adding links, templates, etc.), and then save it to WS.


I'm making a little screencast of how it works; will send the link for 
that to this list soon.


—Sam


On 14/10/16 07:52, Alex Brollo wrote:
Back to the tool, is there some more doc to understand - step by step 
- how to run it? I imagine, that there's the need of a Gutemberg text 
and of a wikisource Index page coming from the same edition used by 
Gutemberg text; then the tool allows something like a "manual match 
and split". But perhaps I didn't understand anything I need to see 
the tool at work to understand it! :-(


At its beginning, it.source uploaded many books from an Italian 
project, LiberLiber, somehow similar to Project Gutemberg, and we 
often convert those ns0-only texts into proofread ones by various 
tricks; so I'd like to learn anything from Sam's tool.


Alex

2016-10-14 12:55 GMT+02:00 Anika Born >:


Hy Alex,

My comment was not about spending some time on a PG-Projekt or not
spending any time at all.

The point/question (when it comes to de-WS) is a different one:

(A) to spend some of our valuable contributions into a project
that already is freely available (in another format) or spend this
time in a (related) project that is NOT already freely available?
(and we do have a lot of them)

// note, it is not about not spending any time in proofreading
or the Wikisourceproject... it is about finding valuable
projects/texts to invest our time...


+ (B) to spend this time in a project, that may cost us the
findability of the whole wikisource-project (and all other texts
on wikisource) because Google/Bing/others do tag us as
fork/reuser/copy of ... (as happened in the past, at least with
de, when we had some texts of the commercial
http://gutenberg.spiegel.de/ that is also supported by ABBY with a
free softwarelizense)


Anika

2016-10-14 10:13 GMT+02:00 Alex Brollo mailto:alex.bro...@gmail.com>>:

I'm too very interested both into the idea and into its
technical implementation, but I need some more doc for dummies
to understand it fully :-(

About importing into wikisource texts alreary proofread: a
text into wikisource is different from a similar text into
another web site, since it is "a node into wiki network", and
this goal deserves IMHO some pain to proofread (and re-format)
 it again, adding lots of wiki cross links.

Alex


2016-10-14 8:27 GMT+02:00 Andrea Zanni
mailto:zanni.andre...@gmail.com>>:

I think the idea is good,
but I would like to try that in my wikisource:
could you manage to take also the few italian books that
PG has?
Thanks!

On Fri, Oct 14, 2016 at 8:23 AM, Anika Born
mailto:wikian...@wikipedia.de>>
wrote:

corr1: [...] does not ha*ve*/show the scans, [...]

Anika

2016-10-14 8:18 GMT+02:00 Anika Born
mailto:wikian...@wikipedia.de>>:

Hy Sam,

would be good, cause PG does not hat/show the scans,

But

as I remember there was/is a policy at de.ws
 to not use texts from other
projects (say: if there is text A in PG, there
won't be a similar text A in de.WS),

cause at the time de.WS did use PG-texts... Google
said WS is a mirror of PG and all other (not
PG)-texts were left out in Google-Search-Results
as well  The (small) visibility of WS got lost
completely... That is the reason, why there are no
new projects on de-WS about texts that are
available in a (nearly) similar project

(besides the effort: why spending so much time on
a text that already is avilable? - you'd have to
proofread ist at least two times)


But that is this special German-thing.


What do the others think about it?
Anika

2016-10-14 3:20 GMT+02:00 Sam Wilson
mailto:s...@samwilson.id.au>>:

Hi all,

I've been tinkering with an 

Re: [Wikisource-l] Importing books from Project Gutenberg

2016-10-15 Thread Sam Wilson
That's a really good point Anika, I'd not considered that having PG 
books could be detrimental to Wikisource! :-(


I guess the reverse could also be true? That Google might think that PG 
is a mirror of WS, and decrease PG's page-rank. Either way, not great.


How can I investigate whether this is occuring? How did you figure it 
out for de.ws?


As for replicating the effort: I figure that if there are people 
interested in doing it, then why not! :-) Personally, I want to make 
Wikisource the best digital library it can be, and when I show it to 
people and they say "oh but you haven't got all of Dickens" or 
something, then I want to fix that. And it seems that importing other 
existing (free and open) digital libraries can help with this in a 
quicker fashion than straight-up proofreading. But I totally can see why 
people wouldn't want to spend time doing it! And that's cool.


:-)

—Sam


On 14/10/16 03:55, Anika Born wrote:

Hy Alex,

My comment was not about spending some time on a PG-Projekt or not 
spending any time at all.


The point/question (when it comes to de-WS) is a different one:

(A) to spend some of our valuable contributions into a project that 
already is freely available (in another format) or spend this time in 
a (related) project that is NOT already freely available? (and we do 
have a lot of them)


// note, it is not about not spending any time in proofreading or
the Wikisourceproject... it is about finding valuable
projects/texts to invest our time...


+ (B) to spend this time in a project, that may cost us the 
findability of the whole wikisource-project (and all other texts on 
wikisource) because Google/Bing/others do tag us as fork/reuser/copy 
of ... (as happened in the past, at least with de, when we had some 
texts of the commercial http://gutenberg.spiegel.de/ that is also 
supported by ABBY with a free softwarelizense)



Anika

2016-10-14 10:13 GMT+02:00 Alex Brollo >:


I'm too very interested both into the idea and into its technical
implementation, but I need some more doc for dummies to understand
it fully :-(

About importing into wikisource texts alreary proofread: a text
into wikisource is different from a similar text into another web
site, since it is "a node into wiki network", and this goal
deserves IMHO some pain to proofread (and re-format)  it again,
adding lots of wiki cross links.

Alex


2016-10-14 8:27 GMT+02:00 Andrea Zanni mailto:zanni.andre...@gmail.com>>:

I think the idea is good,
but I would like to try that in my wikisource:
could you manage to take also the few italian books that PG has?
Thanks!

On Fri, Oct 14, 2016 at 8:23 AM, Anika Born
mailto:wikian...@wikipedia.de>> wrote:

corr1: [...] does not ha*ve*/show the scans, [...]

Anika

2016-10-14 8:18 GMT+02:00 Anika Born
mailto:wikian...@wikipedia.de>>:

Hy Sam,

would be good, cause PG does not hat/show the scans,

But

as I remember there was/is a policy at de.ws
 to not use texts from other projects
(say: if there is text A in PG, there won't be a
similar text A in de.WS),

cause at the time de.WS did use PG-texts... Google
said WS is a mirror of PG and all other (not PG)-texts
were left out in Google-Search-Results as well 
The (small) visibility of WS got lost completely...

That is the reason, why there are no new projects on
de-WS about texts that are available in a (nearly)
similar project

(besides the effort: why spending so much time on a
text that already is avilable? - you'd have to
proofread ist at least two times)


But that is this special German-thing.


What do the others think about it?
Anika

2016-10-14 3:20 GMT+02:00 Sam Wilson
mailto:s...@samwilson.id.au>>:

Hi all,

I've been tinkering with an idea I've had for
importing Project Gutenberg books into Wikisource:
http://tools.wmflabs.org/pg2ws/


The idea is that, if Wikidata makes a link between
a PG ID number and a Wikisource Index page, then
we can go through that Index page one page at a
time, and copy the page's text from the PG book to
the WS page.

The interface so far isn't very brilliant, but I'm
just trying to figure out if this is worthwhile or
not. Basically, it's a matter of

Re: [Wikisource-l] Importing books from Project Gutenberg

2016-10-14 Thread Nicolas VIGNERON
Hi Sam,

Good idea !

For me, the Wikidata linking part seems (maybe the most) important. That's
a great tools to visualise that most books are badly put in Wikidata (so
much P1957 missing!).

The importing from PG part seems important too (but for fr.ws - IIRC - we
already have most of PG works).

Cdlt, ~nicolas

2016-10-14 12:55 GMT+02:00 Anika Born :

> Hy Alex,
>
> My comment was not about spending some time on a PG-Projekt or not
> spending any time at all.
>
> The point/question (when it comes to de-WS) is a different one:
>
> (A) to spend some of our valuable contributions into a project that
> already is freely available (in another format) or spend this time in a
> (related) project that is NOT already freely available? (and we do have a
> lot of them)
>
> // note, it is not about not spending any time in proofreading or the
> Wikisourceproject... it is about finding valuable projects/texts to invest
> our time...
>
>
I see the thing differently: when a text is on Gutenberg, why should we
redo it again from scratch on Wikisource when we can just copy it?


> + (B) to spend this time in a project, that may cost us the findability of
> the whole wikisource-project (and all other texts on wikisource) because
> Google/Bing/others do tag us as fork/reuser/copy of ... (as happened in the
> past, at least with de, when we had some texts of the commercial
> http://gutenberg.spiegel.de/ that is also supported by ABBY with a free
> softwarelizense)
>

I've never heard of this before. Did it happen only on de.ws ? is it really
because of copying Gutenberg? (and was it before the proofreading which
changed pretty much everything ?)

Cdlt, ~nicolas
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] Importing books from Project Gutenberg

2016-10-14 Thread Jan Berkel



> Anyway, just thought I'd mention it. :-) Anyone think this is an
> avenue worth exploring? Certainly I'd love to be able to say we've got
> everything PG has *and more*!


Hello Sam,

great idea, moving texts over to WS is definitely worth doing in my
opinion. The text can be changed / reformatted  / relinked if needed and
we have the ability to export texts on demand to many formats, tweaking
the output if needed. PG only offers static files.
 
Jan
 
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] Importing books from Project Gutenberg

2016-10-14 Thread Alex Brollo
Back to the tool, is there some more doc to understand - step by step - how
to run it? I imagine, that there's the need of a Gutemberg text and of a
wikisource Index page coming from the same edition used by Gutemberg text;
then the tool allows something like a "manual match and split". But perhaps
I didn't understand anything I need to see the tool at work to
understand it! :-(

At its beginning, it.source uploaded many books from an Italian project,
LiberLiber, somehow similar to Project Gutemberg, and we often convert
those ns0-only texts into proofread ones by various tricks; so I'd like to
learn anything from Sam's tool.

Alex

2016-10-14 12:55 GMT+02:00 Anika Born :

> Hy Alex,
>
> My comment was not about spending some time on a PG-Projekt or not
> spending any time at all.
>
> The point/question (when it comes to de-WS) is a different one:
>
> (A) to spend some of our valuable contributions into a project that
> already is freely available (in another format) or spend this time in a
> (related) project that is NOT already freely available? (and we do have a
> lot of them)
>
> // note, it is not about not spending any time in proofreading or the
> Wikisourceproject... it is about finding valuable projects/texts to invest
> our time...
>
>
> + (B) to spend this time in a project, that may cost us the findability of
> the whole wikisource-project (and all other texts on wikisource) because
> Google/Bing/others do tag us as fork/reuser/copy of ... (as happened in the
> past, at least with de, when we had some texts of the commercial
> http://gutenberg.spiegel.de/ that is also supported by ABBY with a free
> softwarelizense)
>
>
> Anika
>
> 2016-10-14 10:13 GMT+02:00 Alex Brollo :
>
>> I'm too very interested both into the idea and into its technical
>> implementation, but I need some more doc for dummies to understand it fully
>> :-(
>>
>> About importing into wikisource texts alreary proofread: a text into
>> wikisource is different from a similar text into another web site, since it
>> is "a node into wiki network", and this goal deserves IMHO some pain to
>> proofread (and re-format)  it again, adding lots of wiki cross links.
>>
>> Alex
>>
>>
>> 2016-10-14 8:27 GMT+02:00 Andrea Zanni :
>>
>>> I think the idea is good,
>>> but I would like to try that in my wikisource:
>>> could you manage to take also the few italian books that PG has?
>>> Thanks!
>>>
>>> On Fri, Oct 14, 2016 at 8:23 AM, Anika Born 
>>> wrote:
>>>
 corr1: [...] does not ha*ve*/show the scans, [...]

 Anika

 2016-10-14 8:18 GMT+02:00 Anika Born :

> Hy Sam,
>
> would be good, cause PG does not hat/show the scans,
>
> But
>
> as I remember there was/is a policy at de.ws to not use texts from
> other projects (say: if there is text A in PG, there won't be a similar
> text A in de.WS),
>
> cause at the time de.WS did use PG-texts... Google said WS is a mirror
> of PG and all other (not PG)-texts were left out in Google-Search-Results
> as well  The (small) visibility of WS got lost completely... That is
> the reason, why there are no new projects on de-WS about texts that are
> available in a (nearly) similar project
>
> (besides the effort: why spending so much time on a text that already
> is avilable? - you'd have to proofread ist at least two times)
>
>
> But that is this special German-thing.
>
>
> What do the others think about it?
> Anika
>
> 2016-10-14 3:20 GMT+02:00 Sam Wilson :
>
>> Hi all,
>>
>> I've been tinkering with an idea I've had for importing Project
>> Gutenberg books into Wikisource: http://tools.wmflabs.org/pg2ws/
>>
>> The idea is that, if Wikidata makes a link between a PG ID number and
>> a Wikisource Index page, then we can go through that Index page one page 
>> at
>> a time, and copy the page's text from the PG book to the WS page.
>>
>> The interface so far isn't very brilliant, but I'm just trying to
>> figure out if this is worthwhile or not. Basically, it's a matter of
>> selecting the right chunk of text in the right-most text box (the full PG
>> text) and hitting the button to move it left into the centre box. Then
>> cleaning it up (manually and with the magic cleaning button) to make it
>> match the image, and then uploading it to Wikisource.
>>
>> It's a bad tool though, because it doesn't handle the running header,
>> and the copy-across button doesn't do nice things with {{hws}} etc. — not
>> to mention all the other things it doesn't do.
>>
>> Anyway, just thought I'd mention it. :-) Anyone think this is an
>> avenue worth exploring? Certainly I'd love to be able to say we've got
>> everything PG has *and more*!
>>
>> —Sam
>>
>> PS changes made by this tool are all tagged as "OAuth CID: 638" —
>>
>> https://en.wikisource.org/w/index

Re: [Wikisource-l] Importing books from Project Gutenberg

2016-10-14 Thread Anika Born
Hy Alex,

My comment was not about spending some time on a PG-Projekt or not spending
any time at all.

The point/question (when it comes to de-WS) is a different one:

(A) to spend some of our valuable contributions into a project that already
is freely available (in another format) or spend this time in a (related)
project that is NOT already freely available? (and we do have a lot of them)

// note, it is not about not spending any time in proofreading or the
Wikisourceproject... it is about finding valuable projects/texts to invest
our time...


+ (B) to spend this time in a project, that may cost us the findability of
the whole wikisource-project (and all other texts on wikisource) because
Google/Bing/others do tag us as fork/reuser/copy of ... (as happened in the
past, at least with de, when we had some texts of the commercial
http://gutenberg.spiegel.de/ that is also supported by ABBY with a free
softwarelizense)


Anika

2016-10-14 10:13 GMT+02:00 Alex Brollo :

> I'm too very interested both into the idea and into its technical
> implementation, but I need some more doc for dummies to understand it fully
> :-(
>
> About importing into wikisource texts alreary proofread: a text into
> wikisource is different from a similar text into another web site, since it
> is "a node into wiki network", and this goal deserves IMHO some pain to
> proofread (and re-format)  it again, adding lots of wiki cross links.
>
> Alex
>
>
> 2016-10-14 8:27 GMT+02:00 Andrea Zanni :
>
>> I think the idea is good,
>> but I would like to try that in my wikisource:
>> could you manage to take also the few italian books that PG has?
>> Thanks!
>>
>> On Fri, Oct 14, 2016 at 8:23 AM, Anika Born 
>> wrote:
>>
>>> corr1: [...] does not ha*ve*/show the scans, [...]
>>>
>>> Anika
>>>
>>> 2016-10-14 8:18 GMT+02:00 Anika Born :
>>>
 Hy Sam,

 would be good, cause PG does not hat/show the scans,

 But

 as I remember there was/is a policy at de.ws to not use texts from
 other projects (say: if there is text A in PG, there won't be a similar
 text A in de.WS),

 cause at the time de.WS did use PG-texts... Google said WS is a mirror
 of PG and all other (not PG)-texts were left out in Google-Search-Results
 as well  The (small) visibility of WS got lost completely... That is
 the reason, why there are no new projects on de-WS about texts that are
 available in a (nearly) similar project

 (besides the effort: why spending so much time on a text that already
 is avilable? - you'd have to proofread ist at least two times)


 But that is this special German-thing.


 What do the others think about it?
 Anika

 2016-10-14 3:20 GMT+02:00 Sam Wilson :

> Hi all,
>
> I've been tinkering with an idea I've had for importing Project
> Gutenberg books into Wikisource: http://tools.wmflabs.org/pg2ws/
>
> The idea is that, if Wikidata makes a link between a PG ID number and
> a Wikisource Index page, then we can go through that Index page one page 
> at
> a time, and copy the page's text from the PG book to the WS page.
>
> The interface so far isn't very brilliant, but I'm just trying to
> figure out if this is worthwhile or not. Basically, it's a matter of
> selecting the right chunk of text in the right-most text box (the full PG
> text) and hitting the button to move it left into the centre box. Then
> cleaning it up (manually and with the magic cleaning button) to make it
> match the image, and then uploading it to Wikisource.
>
> It's a bad tool though, because it doesn't handle the running header,
> and the copy-across button doesn't do nice things with {{hws}} etc. — not
> to mention all the other things it doesn't do.
>
> Anyway, just thought I'd mention it. :-) Anyone think this is an
> avenue worth exploring? Certainly I'd love to be able to say we've got
> everything PG has *and more*!
>
> —Sam
>
> PS changes made by this tool are all tagged as "OAuth CID: 638" —
>
> https://en.wikisource.org/w/index.php?title=Special:RecentCh
> anges&tagfilter=OAuth+CID%3A+638
>
> ___
> Wikisource-l mailing list
> Wikisource-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>
>

>>>
>>> ___
>>> Wikisource-l mailing list
>>> Wikisource-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>>>
>>>
>>
>> ___
>> Wikisource-l mailing list
>> Wikisource-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>>
>>
>
> ___
> Wikisource-l mailing list
> Wikisource-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/

Re: [Wikisource-l] Importing books from Project Gutenberg

2016-10-14 Thread Alex Brollo
I'm too very interested both into the idea and into its technical
implementation, but I need some more doc for dummies to understand it fully
:-(

About importing into wikisource texts alreary proofread: a text into
wikisource is different from a similar text into another web site, since it
is "a node into wiki network", and this goal deserves IMHO some pain to
proofread (and re-format)  it again, adding lots of wiki cross links.

Alex


2016-10-14 8:27 GMT+02:00 Andrea Zanni :

> I think the idea is good,
> but I would like to try that in my wikisource:
> could you manage to take also the few italian books that PG has?
> Thanks!
>
> On Fri, Oct 14, 2016 at 8:23 AM, Anika Born 
> wrote:
>
>> corr1: [...] does not ha*ve*/show the scans, [...]
>>
>> Anika
>>
>> 2016-10-14 8:18 GMT+02:00 Anika Born :
>>
>>> Hy Sam,
>>>
>>> would be good, cause PG does not hat/show the scans,
>>>
>>> But
>>>
>>> as I remember there was/is a policy at de.ws to not use texts from
>>> other projects (say: if there is text A in PG, there won't be a similar
>>> text A in de.WS),
>>>
>>> cause at the time de.WS did use PG-texts... Google said WS is a mirror
>>> of PG and all other (not PG)-texts were left out in Google-Search-Results
>>> as well  The (small) visibility of WS got lost completely... That is
>>> the reason, why there are no new projects on de-WS about texts that are
>>> available in a (nearly) similar project
>>>
>>> (besides the effort: why spending so much time on a text that already is
>>> avilable? - you'd have to proofread ist at least two times)
>>>
>>>
>>> But that is this special German-thing.
>>>
>>>
>>> What do the others think about it?
>>> Anika
>>>
>>> 2016-10-14 3:20 GMT+02:00 Sam Wilson :
>>>
 Hi all,

 I've been tinkering with an idea I've had for importing Project
 Gutenberg books into Wikisource: http://tools.wmflabs.org/pg2ws/

 The idea is that, if Wikidata makes a link between a PG ID number and a
 Wikisource Index page, then we can go through that Index page one page at a
 time, and copy the page's text from the PG book to the WS page.

 The interface so far isn't very brilliant, but I'm just trying to
 figure out if this is worthwhile or not. Basically, it's a matter of
 selecting the right chunk of text in the right-most text box (the full PG
 text) and hitting the button to move it left into the centre box. Then
 cleaning it up (manually and with the magic cleaning button) to make it
 match the image, and then uploading it to Wikisource.

 It's a bad tool though, because it doesn't handle the running header,
 and the copy-across button doesn't do nice things with {{hws}} etc. — not
 to mention all the other things it doesn't do.

 Anyway, just thought I'd mention it. :-) Anyone think this is an avenue
 worth exploring? Certainly I'd love to be able to say we've got everything
 PG has *and more*!

 —Sam

 PS changes made by this tool are all tagged as "OAuth CID: 638" —

 https://en.wikisource.org/w/index.php?title=Special:RecentCh
 anges&tagfilter=OAuth+CID%3A+638

 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l


>>>
>>
>> ___
>> Wikisource-l mailing list
>> Wikisource-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>>
>>
>
> ___
> Wikisource-l mailing list
> Wikisource-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>
>
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] Importing books from Project Gutenberg

2016-10-13 Thread Andrea Zanni
I think the idea is good,
but I would like to try that in my wikisource:
could you manage to take also the few italian books that PG has?
Thanks!

On Fri, Oct 14, 2016 at 8:23 AM, Anika Born  wrote:

> corr1: [...] does not ha*ve*/show the scans, [...]
>
> Anika
>
> 2016-10-14 8:18 GMT+02:00 Anika Born :
>
>> Hy Sam,
>>
>> would be good, cause PG does not hat/show the scans,
>>
>> But
>>
>> as I remember there was/is a policy at de.ws to not use texts from other
>> projects (say: if there is text A in PG, there won't be a similar text A in
>> de.WS),
>>
>> cause at the time de.WS did use PG-texts... Google said WS is a mirror of
>> PG and all other (not PG)-texts were left out in Google-Search-Results as
>> well  The (small) visibility of WS got lost completely... That is the
>> reason, why there are no new projects on de-WS about texts that are
>> available in a (nearly) similar project
>>
>> (besides the effort: why spending so much time on a text that already is
>> avilable? - you'd have to proofread ist at least two times)
>>
>>
>> But that is this special German-thing.
>>
>>
>> What do the others think about it?
>> Anika
>>
>> 2016-10-14 3:20 GMT+02:00 Sam Wilson :
>>
>>> Hi all,
>>>
>>> I've been tinkering with an idea I've had for importing Project
>>> Gutenberg books into Wikisource: http://tools.wmflabs.org/pg2ws/
>>>
>>> The idea is that, if Wikidata makes a link between a PG ID number and a
>>> Wikisource Index page, then we can go through that Index page one page at a
>>> time, and copy the page's text from the PG book to the WS page.
>>>
>>> The interface so far isn't very brilliant, but I'm just trying to figure
>>> out if this is worthwhile or not. Basically, it's a matter of selecting the
>>> right chunk of text in the right-most text box (the full PG text) and
>>> hitting the button to move it left into the centre box. Then cleaning it up
>>> (manually and with the magic cleaning button) to make it match the image,
>>> and then uploading it to Wikisource.
>>>
>>> It's a bad tool though, because it doesn't handle the running header,
>>> and the copy-across button doesn't do nice things with {{hws}} etc. — not
>>> to mention all the other things it doesn't do.
>>>
>>> Anyway, just thought I'd mention it. :-) Anyone think this is an avenue
>>> worth exploring? Certainly I'd love to be able to say we've got everything
>>> PG has *and more*!
>>>
>>> —Sam
>>>
>>> PS changes made by this tool are all tagged as "OAuth CID: 638" —
>>>
>>> https://en.wikisource.org/w/index.php?title=Special:RecentCh
>>> anges&tagfilter=OAuth+CID%3A+638
>>>
>>> ___
>>> Wikisource-l mailing list
>>> Wikisource-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>>>
>>>
>>
>
> ___
> Wikisource-l mailing list
> Wikisource-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>
>
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] Importing books from Project Gutenberg

2016-10-13 Thread Anika Born
corr1: [...] does not ha*ve*/show the scans, [...]

Anika

2016-10-14 8:18 GMT+02:00 Anika Born :

> Hy Sam,
>
> would be good, cause PG does not hat/show the scans,
>
> But
>
> as I remember there was/is a policy at de.ws to not use texts from other
> projects (say: if there is text A in PG, there won't be a similar text A in
> de.WS),
>
> cause at the time de.WS did use PG-texts... Google said WS is a mirror of
> PG and all other (not PG)-texts were left out in Google-Search-Results as
> well  The (small) visibility of WS got lost completely... That is the
> reason, why there are no new projects on de-WS about texts that are
> available in a (nearly) similar project
>
> (besides the effort: why spending so much time on a text that already is
> avilable? - you'd have to proofread ist at least two times)
>
>
> But that is this special German-thing.
>
>
> What do the others think about it?
> Anika
>
> 2016-10-14 3:20 GMT+02:00 Sam Wilson :
>
>> Hi all,
>>
>> I've been tinkering with an idea I've had for importing Project Gutenberg
>> books into Wikisource: http://tools.wmflabs.org/pg2ws/
>>
>> The idea is that, if Wikidata makes a link between a PG ID number and a
>> Wikisource Index page, then we can go through that Index page one page at a
>> time, and copy the page's text from the PG book to the WS page.
>>
>> The interface so far isn't very brilliant, but I'm just trying to figure
>> out if this is worthwhile or not. Basically, it's a matter of selecting the
>> right chunk of text in the right-most text box (the full PG text) and
>> hitting the button to move it left into the centre box. Then cleaning it up
>> (manually and with the magic cleaning button) to make it match the image,
>> and then uploading it to Wikisource.
>>
>> It's a bad tool though, because it doesn't handle the running header, and
>> the copy-across button doesn't do nice things with {{hws}} etc. — not to
>> mention all the other things it doesn't do.
>>
>> Anyway, just thought I'd mention it. :-) Anyone think this is an avenue
>> worth exploring? Certainly I'd love to be able to say we've got everything
>> PG has *and more*!
>>
>> —Sam
>>
>> PS changes made by this tool are all tagged as "OAuth CID: 638" —
>>
>> https://en.wikisource.org/w/index.php?title=Special:RecentCh
>> anges&tagfilter=OAuth+CID%3A+638
>>
>> ___
>> Wikisource-l mailing list
>> Wikisource-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>>
>>
>
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] Importing books from Project Gutenberg

2016-10-13 Thread Anika Born
Hy Sam,

would be good, cause PG does not hat/show the scans,

But

as I remember there was/is a policy at de.ws to not use texts from other
projects (say: if there is text A in PG, there won't be a similar text A in
de.WS),

cause at the time de.WS did use PG-texts... Google said WS is a mirror of
PG and all other (not PG)-texts were left out in Google-Search-Results as
well  The (small) visibility of WS got lost completely... That is the
reason, why there are no new projects on de-WS about texts that are
available in a (nearly) similar project

(besides the effort: why spending so much time on a text that already is
avilable? - you'd have to proofread ist at least two times)


But that is this special German-thing.


What do the others think about it?
Anika

2016-10-14 3:20 GMT+02:00 Sam Wilson :

> Hi all,
>
> I've been tinkering with an idea I've had for importing Project Gutenberg
> books into Wikisource: http://tools.wmflabs.org/pg2ws/
>
> The idea is that, if Wikidata makes a link between a PG ID number and a
> Wikisource Index page, then we can go through that Index page one page at a
> time, and copy the page's text from the PG book to the WS page.
>
> The interface so far isn't very brilliant, but I'm just trying to figure
> out if this is worthwhile or not. Basically, it's a matter of selecting the
> right chunk of text in the right-most text box (the full PG text) and
> hitting the button to move it left into the centre box. Then cleaning it up
> (manually and with the magic cleaning button) to make it match the image,
> and then uploading it to Wikisource.
>
> It's a bad tool though, because it doesn't handle the running header, and
> the copy-across button doesn't do nice things with {{hws}} etc. — not to
> mention all the other things it doesn't do.
>
> Anyway, just thought I'd mention it. :-) Anyone think this is an avenue
> worth exploring? Certainly I'd love to be able to say we've got everything
> PG has *and more*!
>
> —Sam
>
> PS changes made by this tool are all tagged as "OAuth CID: 638" —
>
> https://en.wikisource.org/w/index.php?title=Special:
> RecentChanges&tagfilter=OAuth+CID%3A+638
>
> ___
> Wikisource-l mailing list
> Wikisource-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>
>
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


[Wikisource-l] Importing books from Project Gutenberg

2016-10-13 Thread Sam Wilson

Hi all,

I've been tinkering with an idea I've had for importing Project 
Gutenberg books into Wikisource: http://tools.wmflabs.org/pg2ws/


The idea is that, if Wikidata makes a link between a PG ID number and a 
Wikisource Index page, then we can go through that Index page one page 
at a time, and copy the page's text from the PG book to the WS page.


The interface so far isn't very brilliant, but I'm just trying to figure 
out if this is worthwhile or not. Basically, it's a matter of selecting 
the right chunk of text in the right-most text box (the full PG text) 
and hitting the button to move it left into the centre box. Then 
cleaning it up (manually and with the magic cleaning button) to make it 
match the image, and then uploading it to Wikisource.


It's a bad tool though, because it doesn't handle the running header, 
and the copy-across button doesn't do nice things with {{hws}} etc. — 
not to mention all the other things it doesn't do.


Anyway, just thought I'd mention it. :-) Anyone think this is an avenue 
worth exploring? Certainly I'd love to be able to say we've got 
everything PG has /and more/!


—Sam

PS changes made by this tool are all tagged as "OAuth CID: 638" —

https://en.wikisource.org/w/index.php?title=Special:RecentChanges&tagfilter=OAuth+CID%3A+638

___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l