Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread Peter Gervai
On Tue, Jun 23, 2009 at 03:15, Platonidesplatoni...@gmail.com wrote:
 Although not trivial, downloading all images is in fact quite easy. You
 can find scripts to do that already made. You can also ask Brion to
 rsync3 them.
 But do you have enough space to dedicate?
 How many wikis do you want to mirror? Just commons is more than 3 TB...

Well disks are cheap nowadays. If it's really just the question of
asking, I may be interested. for example.

The more complex question is the parameters of such usage, meaning
what can I do with the images after I've got them. This is the main
reason behind not publishing them in the first hand: the images itself
aren't suggesting any particular license.

Now that I wrote this, it would be possible (not sure if feasible,
though) to publish CC-BY-SA pictures with author info in the comment
of the image itself. Most image formats support sizeable comment
blocks, and standardised templates make it possible to select media by
license, and get author/copyright info to put into the file.

 That's the reason so few people were interested in the images when the
 image dump was available.

People are interested, generally, but not in mirroring the whole shebang. :-)

grin

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread Samuel Klein
Yes, but my understanding is that while google provided part of the mbp data
and scans, its continued updates to ocr since then are not being shared.  I
would be glad to learn this was not the case...

samuel klein.  s...@laptop.org.  +1 617 529 4266

On Jun 21, 2009 3:14 AM, Nikola Smolenski smole...@eunet.yu wrote:

Дана Saturday 20 June 2009 18:29:24 Brian написа:

 This has reminded me to complain about Google Books. Google has the
world's  best OCR (in virtue ...
Often, these books are available in the Million Books Project too.

___ foundation-l mailing list
foundatio...@lists.wikime...
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread Anthony
On Mon, Jun 22, 2009 at 9:15 PM, Platonides platoni...@gmail.com wrote:

 Anthony wrote:
  (although I still haven't seen the WMF step up
  to the plate and make it easy for people to make a full history fork, or
  even to download all the images)

 You'll find full history dumps of almost all wikis at
 http://download.wikimedia.org/


Key word being almost.

Although not trivial, downloading all images is in fact quite easy.


Yep.  All I need is permission.


 But do you have enough space to dedicate?


Not at the moment.  No sense in buying the drives when I don't have
permission to fill them up.


 How many wikis do you want to mirror? Just commons is more than 3 TB...


Commons and En.wikipedia would probably be good for starters.

The main thing I want is permission to scrape en.wikipedia, though.  (Not
really scraping, as I'd probably use the API and Special:Export.  Basically
I just would like someone official to tell me how *fast* I'm allowed to use
the API and Special:Export.  Special:Export especially, because I could
easily overwhelm the servers using that, due to a bug in the script.)

That's the reason so few people were interested in the images when the
 image dump was available.


I downloaded it.  It was well under 1 TB at the time.
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread Brian
2009/6/23 Samuel Klein meta...@gmail.com

 Yes, but my understanding is that while google provided part of the mbp
 data
 and scans, its continued updates to ocr since then are not being shared.  I
 would be glad to learn this was not the case...


The dataset you need to train an OCR system to be as good as theirs is the
raw images and the plain text. They aren't making it easy to get either of
those things :( They have presumably improved the software in other ways as
well..

WTF GOOG?
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread Michael Snow
Brian wrote:
 2009/6/23 Samuel Klein meta...@gmail.com
   
 Yes, but my understanding is that while google provided part of the mbp
 data
 and scans, its continued updates to ocr since then are not being shared.  I
 would be glad to learn this was not the case...
 
 The dataset you need to train an OCR system to be as good as theirs is the
 raw images and the plain text. They aren't making it easy to get either of
 those things :( They have presumably improved the software in other ways as
 well..

 WTF GOOG?
   
Well, when your shorthand uses their stock ticker symbol, your argument 
has already been coopted.

--Michael Snow

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread Brian
On Tue, Jun 23, 2009 at 11:44 AM, Michael Snow wikipe...@verizon.netwrote:


  The dataset you need to train an OCR system to be as good as theirs is
 the
  raw images and the plain text. They aren't making it easy to get either
 of
  those things :( They have presumably improved the software in other ways
 as
  well..
 
  WTF GOOG?
 
 Well, when your shorthand uses their stock ticker symbol, your argument
 has already been coopted.

 --Michael Snow


I get the joke but um, I used it on purpose and which one of my arguments
been coopted ??
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread Michael Snow
Brian wrote:
 On Tue, Jun 23, 2009 at 11:44 AM, Michael Snow wikipe...@verizon.netwrote:
   
 The dataset you need to train an OCR system to be as good as theirs is
   
 the
 
 raw images and the plain text. They aren't making it easy to get either
   
 of
 
 those things :( They have presumably improved the software in other ways
   
 as
 
 well..

 WTF GOOG?
   
 Well, when your shorthand uses their stock ticker symbol, your argument
 has already been coopted.

 --Michael Snow
 
 I get the joke but um, I used it on purpose and which one of my arguments
 been coopted ??
   
Coopting is not like rebutting; it does not bite chunks out of specific 
pieces, it swallows whole. Symbols are powerful things, perhaps even 
more so outside the mathematical logic of argument. They do not serve 
only your purposes, even if you use them purposefully. My observations 
may be wry, but they are not entirely in jest.

--Michael Snow

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread Brian
Ok Shakespeare. But in plain english you appear to be saying that
corporations are inherently greedy and have a tendency to be evil. Sure, but
we expect more out of GOOG. This is not MSFT we are talking about.

On Tue, Jun 23, 2009 at 12:13 PM, Michael Snow wikipe...@verizon.netwrote:

 Brian wrote:
  On Tue, Jun 23, 2009 at 11:44 AM, Michael Snow wikipe...@verizon.net
 wrote:
 
  The dataset you need to train an OCR system to be as good as theirs is
 
  the
 
  raw images and the plain text. They aren't making it easy to get either
 
  of
 
  those things :( They have presumably improved the software in other
 ways
 
  as
 
  well..
 
  WTF GOOG?
 
  Well, when your shorthand uses their stock ticker symbol, your argument
  has already been coopted.
 
  --Michael Snow
 
  I get the joke but um, I used it on purpose and which one of my arguments
  been coopted ??
 
 Coopting is not like rebutting; it does not bite chunks out of specific
 pieces, it swallows whole. Symbols are powerful things, perhaps even
 more so outside the mathematical logic of argument. They do not serve
 only your purposes, even if you use them purposefully. My observations
 may be wry, but they are not entirely in jest.

 --Michael Snow

 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread Anthony
On Tue, Jun 23, 2009 at 1:09 PM, Brian brian.min...@colorado.edu wrote:

 2009/6/23 Samuel Klein meta...@gmail.com

  Yes, but my understanding is that while google provided part of the mbp
  data
  and scans, its continued updates to ocr since then are not being shared.
  I
  would be glad to learn this was not the case...
 

 The dataset you need to train an OCR system to be as good as theirs is the
 raw images and the plain text. They aren't making it easy to get either of
 those things :( They have presumably improved the software in other ways as
 well..

 WTF GOOG?


It's almost like they're trying to run a business or something.
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread Anthony
On Tue, Jun 23, 2009 at 2:24 PM, Brian brian.min...@colorado.edu wrote:

 Ok Shakespeare. But in plain english you appear to be saying that
 corporations are inherently greedy and have a tendency to be evil. Sure,
 but
 we expect more out of GOOG. This is not MSFT we are talking about.


Of course they're inherently greedy.  That's the whole purpose of a
for-profit corporation - to make as much money as possible for its
shareholders.  As for tendency to be evil, I think that rests on your
definition of evil.
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread Anthony
On Tue, Jun 23, 2009 at 3:58 PM, Anthony wikim...@inbox.org wrote:

 On Tue, Jun 23, 2009 at 2:24 PM, Brian brian.min...@colorado.edu wrote:

 Ok Shakespeare. But in plain english you appear to be saying that
 corporations are inherently greedy and have a tendency to be evil. Sure,
 but
 we expect more out of GOOG. This is not MSFT we are talking about.


 Of course they're inherently greedy.  That's the whole purpose of a
 for-profit corporation - to make as much money as possible for its
 shareholders.


I guess even a non-profit is inherently greedy, it's just greedy for
something other than money.  The WMF is greedy for the spread of free
knowledge.

But this is off-topic.  Let's take it to another list or something.
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread John Vandenberg
On Wed, Jun 24, 2009 at 6:10 AM, Anthony wikim...@inbox.org wrote:

 On Tue, Jun 23, 2009 at 3:58 PM, Anthony wikim...@inbox.org wrote:

  On Tue, Jun 23, 2009 at 2:24 PM, Brian brian.min...@colorado.edu wrote:
 
  Ok Shakespeare. But in plain english you appear to be saying that
  corporations are inherently greedy and have a tendency to be evil. Sure,
  but
  we expect more out of GOOG. This is not MSFT we are talking about.
 
 
  Of course they're inherently greedy.  That's the whole purpose of a
  for-profit corporation - to make as much money as possible for its
  shareholders.
 

 I guess even a non-profit is inherently greedy, it's just greedy for
 something other than money.  The WMF is greedy for the spread of free
 knowledge.

 But this is off-topic.  Let's take it to another list or something.

off-topic?? ... surely you jest!!

I think about _three_ of the 50+ emails in this thread have been on
the topic of open access journal articles on Wikisource.

--
John Vandenberg

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-22 Thread Platonides
Anthony wrote:
 On Sun, Jun 21, 2009 at 7:54 AM, John Vandenberg jay...@gmail.com wrote:
 
 Whether Google is good or evil is off-topic, and irrelevant to boot.

 
 Whether or not they have a right to exclude bots isn't.
 
 Also worth noting, Project Gutenberg has digitised less than 30,000
 books since 1971.  Distributed Proofreaders has done 15,000 of those
 since 2000, so throughput is picking up.  But, there are more than
 enough too keep everyone busy for a very long time.
 
 
 The interesting thing is, even if you don't use a bot, it's still faster to
 copy/paste from Google manually than it is to get the book and scan it in
 yourself (assuming you don't want to destroy the original, anyway).
 
 If you're going to make a project out OCRing books that Google has already
 OCRed, I don't see any point in reinventing the scanning or first pass
 OCRing part.

IMHO the interesting bit would be to make a google books browser
prefiling the wiki editor.


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-22 Thread Mark Wagner
On Sat, Jun 20, 2009 at 14:35, Ray Saintongesainto...@telus.net wrote:
 Brian wrote:
 That is against the law. It violates Google's ToS.

 I'm mostly complaining that Google is being Very Evil. There is nothing we
 can do about it except complain to them. Which I don't know how to do - they
 apparently believe that the plain text versions of their books are akin to
 their intellectual property and are unwilling to give them away.


 How is violating Google's ToS against the law?

The verdict in _United States v. Lori Drew_ appears to set a precedent
that violating a site's Terms of Service is a violation of the
Computer Fraud and Abuse Act.  It's not a very strong precedent, but
it's still there.

-- 
Mark
[[en:User:Carnildo]]

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-22 Thread Dan Rosenthal
The statute supports that as well, providing a private right of action  
and civil remedy. It's not entirely that cut and dry (there are  
certain restrictions that must be met) but yeah, it appears that in  
some cases TOS violations can be illegal.

-Dan
On Jun 22, 2009, at 7:49 PM, Mark Wagner wrote:

 On Sat, Jun 20, 2009 at 14:35, Ray Saintongesainto...@telus.net  
 wrote:
 Brian wrote:
 That is against the law. It violates Google's ToS.

 I'm mostly complaining that Google is being Very Evil. There is  
 nothing we
 can do about it except complain to them. Which I don't know how to  
 do - they
 apparently believe that the plain text versions of their books are  
 akin to
 their intellectual property and are unwilling to give them away.


 How is violating Google's ToS against the law?

 The verdict in _United States v. Lori Drew_ appears to set a precedent
 that violating a site's Terms of Service is a violation of the
 Computer Fraud and Abuse Act.  It's not a very strong precedent, but
 it's still there.

 -- 
 Mark
 [[en:User:Carnildo]]

 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-22 Thread Platonides
Anthony wrote:
 (although I still haven't seen the WMF step up
 to the plate and make it easy for people to make a full history fork, or
 even to download all the images)

You'll find full history dumps of almost all wikis at
http://download.wikimedia.org/

Although not trivial, downloading all images is in fact quite easy. You
can find scripts to do that already made. You can also ask Brion to
rsync3 them.
But do you have enough space to dedicate?
How many wikis do you want to mirror? Just commons is more than 3 TB...

That's the reason so few people were interested in the images when the
image dump was available.


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Ray Saintonge
Samuel Klein wrote:
 There is a wealth of work done all the time by primary source
 researchers and publishers, which could be improved on by having
 wikisource entries, translations, c.

 Related question : how appropriate would large numbers of public
 domain texts, with page scans and the best available OCR [and
 translations of same], fit with what Wikisource does now?  This is
 clearly a wiki project that needs to happen : OCR even at its best
 misses rare meaning-bearing words.   If not Wikisource, where should
 this work take place?
   
 From my perspective it fits perfectly with the vision that I had of 
Wikisource on the first day of its existence.  Tim Armstrong 
[[User:Tarmstro99]] has already done a considerable amount of valuable 
work relating to law on Wikisource.  That has been mostly a one-man 
project to deal with a massive amount of material.  Some have even 
proposed deleting all the US Code material on the grounds that we don't 
have the ability to keep it up to date. That has prompted some very 
interesting questions and ideas about how this kind of stuff might be 
handled, but taking those questions to the next level requires lots of 
work.  Most regular Wikisourcerors already have long personal to-do 
lists to keep them busy.  So the question is not really about whether 
Wikisource should host these goods, it's about recruiting volunteers to 
do the hard work.

Ec

 On Sat, Jun 20, 2009 at 11:41 AM, David Gerarddger...@gmail.com wrote:
   
 http://blogs.law.harvard.edu/infolaw/2009/06/19/using-wikisource-as-an-alternative-open-access-repository-for-legal-scholarship/

 Interesting. How well does this fit with what Wikisource does?


 - d.

 


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Nikola Smolenski
Дана Saturday 20 June 2009 18:29:24 Brian написа:
 This has reminded me to complain about Google Books. Google has the world's
 best OCR (in virtue of having the largest OCR'able dataset) and also has a
 mission to scan in all the public domain books they can get their hand on.
 They recently updated their interface to, as they put it, make it easier
 to find our plain text versions of public domain books. If a book is
 available in full view, you can click the 'Plain text' button in the
 toolbar. Unfortunately the only way I've found to download the full text
 of a public domain book from Google is to flip through the book a page at a
 time, copying the text to your clipboard.

Often, these books are available in the Million Books Project too.

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread John Vandenberg
On Sun, Jun 21, 2009 at 1:41 AM, David Gerard dger...@gmail.com wrote:

 http://blogs.law.harvard.edu/infolaw/2009/06/19/using-wikisource-as-an-alternative-open-access-repository-for-legal-scholarship/

 Interesting. How well does this fit with what Wikisource does?

Tim Armstrong is a sysop on Wikisource ... :-)  more below..

On Sun, Jun 21, 2009 at 4:17 PM, Ray Saintonge sainto...@telus.net wrote:

 Samuel Klein wrote:
  There is a wealth of work done all the time by primary source
  researchers and publishers, which could be improved on by having
  wikisource entries, translations, c.
 
  Related question : how appropriate would large numbers of public
  domain texts, with page scans and the best available OCR [and
  translations of same], fit with what Wikisource does now?  This is
  clearly a wiki project that needs to happen : OCR even at its best
  misses rare meaning-bearing words.   If not Wikisource, where should
  this work take place?

If it was published, Wikisource accepts it.  Notability is not a consideration.

The only other open project of comparable size is [[Distributed
Proofreaders]].  Here are our statistics:

http://wikisource.org/wiki/Wikisource:ProofreadPage_Statistics

Most of the Wikisource projects accept free translations.

http://wikisource.org/wiki/WS:COORD

The two English Wikisource featured translations are:

  http://en.wikisource.org/wiki/Balade_to_Rosemounde
  http://en.wikisource.org/wiki/J%27accuse
  (also translated into Dutch)

The two biggest translation projects that I know of are:

  http://en.wikisource.org/wiki/Romance_of_the_Three_Kingdoms
  http://en.wikisource.org/wiki/Bible_(Wikisource)

Another good one is

  http://en.wikisource.org/wiki/Max_Havelaar_(Wikisource)

We also have translations of laws, usually relating to copyright.

  
http://en.wikisource.org/wiki/Ordinance_93-027_of_30_March_1993_on_copyright,_related_rights_and_expressions_of_folklore

  From my perspective it fits perfectly with the vision that I had of
 Wikisource on the first day of its existence.  Tim Armstrong
 [[User:Tarmstro99]] has already done a considerable amount of valuable
 work relating to law on Wikisource.

Tim has been doing high impact work in this area.

   H.R. Rep. No. 94-1476

http://blogs.law.harvard.edu/infolaw/2008/06/17/an-open-access-success-story-just-in-time-for-cali/

   U.S. Statutes at Large

http://blogs.law.harvard.edu/infolaw/2008/06/02/public-records-one-jpeg-at-a-time/

   http://en.wikisource.org/wiki/United_States_Statutes_at_Large

In regards the USC, the majority of it is a mess, but Title 17 is a
great example of where we are heading.

  http://en.wikisource.org/wiki/United_States_Code/Title_17

We also have transcription projects for the UK 1911 copyright act,
which has influenced so many other countries.

http://en.wikisource.org/wiki/Index:The_copyright_act,_1911,_annotated.djvu
http://en.wikisource.org/wiki/Index:A_treatise_upon_the_law_of_copyright.djvu

More can be found from our freshly minted Law index:

   http://en.wikisource.org/wiki/Wikisource:Law

Our two featured texts are:
   http://en.wikisource.org/wiki/South_Africa_Act_1909
   http://en.wikisource.org/wiki/ACLU_v._NSA_(District_Court_opinion)

 Most regular Wikisourcerors already have long personal to-do
 lists to keep them busy.  So the question is not really about whether
 Wikisource should host these goods, it's about recruiting volunteers to
 do the hard work.

If people want to help, but dont know where to start, my
recommendation is that they start proofreading the Stat. volume 1, as
this is goldmine of interesting documents, and will be an excellent
example of crowdsourcing of transcription.

http://en.wikisource.org/wiki/Index:United_States_Statutes_at_Large/Volume_1

Enjoy,
John Vandenberg

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Anthony
On Sun, Jun 21, 2009 at 1:51 AM, Ray Saintonge sainto...@telus.net wrote:

 Stephen Bain wrote:
  On Sun, Jun 21, 2009 at 5:27 AM, Parker Higginsparkerhigg...@gmail.com
 wrote:
 
  Except google isn't asserting any kind of copyright control over these
  books, they're just not making it convenient to download them in your
  preferred format.  Maybe not The Right Thing, but not as boneheaded as
 suing
  a party who reprints public domain material, as was the case in Feist v.
  Rural (the supreme court case you mention.)
 
  They want people to use their service. Fair enough, given that the
  scanning and OCRing happened on their dime.
 
 
 How does that give them any special rights?  There are no database
 protection laws in the US, and sweat-of-the-brow has been rejected as a
 basis for new copyrights.


You're right, it doesn't give them any *special* rights.  They have the same
rights as any other computer owner.  Specifically, they have the right to
choose who uses their computers, and how they use them.  Whether or not a
terms of service is legally binding is really not the issue. (*)  The issue
is whether or not they have a duty to make it *convenient* for you to
download the data.  Of course they don't.  Why should they be required to
help you put them out of business?  That kind of twisted logic might make
sense in the non-profit world (although I still haven't seen the WMF step up
to the plate and make it easy for people to make a full history fork, or
even to download all the images), but Google is not a non-profit
organization.  Google would be Evil if it *didn't* protect itself against
this, as it'd be breaking a promise to its shareholders.

(*) Personally, I'm of the opinion that merely accessing a website is not
sufficient to bind a websurfer to a TOS, and that at most a TOS which you do
not have to even click agree to is a unilateral contract which can only
impose promises upon the offeror, though this is not a legal opinion but
merely my opinion of what the law should be.
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Anthony
On Sun, Jun 21, 2009 at 7:17 AM, Anthony wikim...@inbox.org wrote:

 (*) Personally, I'm of the opinion that merely accessing a website is not
 sufficient to bind a websurfer to a TOS, and that at most a TOS which you do
 not have to even click agree to is a unilateral contract which can only
 impose promises upon the offeror, though this is not a legal opinion but
 merely my opinion of what the law should be.


You know what, after further thought I'm going to withdraw that.  First of
all, I think Google does require you to click agree before you can access
the service we're talking about.  But more importantly, I'm going to cast
doubt on my previously held opinion of whether or not a TOS should be able
to bind someone who didn't click on anything.  If I leave a bunch of Apples
on the table at work and put next to it a sign that says Apples: $.25
each...  I don't know, I'll have to think about it.
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread John Vandenberg
On Sun, Jun 21, 2009 at 9:17 PM, Anthony wikim...@inbox.org wrote:

 On Sun, Jun 21, 2009 at 1:51 AM, Ray Saintonge sainto...@telus.net wrote:

  Stephen Bain wrote:
   On Sun, Jun 21, 2009 at 5:27 AM, Parker Higginsparkerhigg...@gmail.com
  wrote:
  
   Except google isn't asserting any kind of copyright control over these
   books, they're just not making it convenient to download them in your
   preferred format.  Maybe not The Right Thing, but not as boneheaded as
  suing
   a party who reprints public domain material, as was the case in Feist v.
   Rural (the supreme court case you mention.)
  
   They want people to use their service. Fair enough, given that the
   scanning and OCRing happened on their dime.
  
  
  How does that give them any special rights?  There are no database
  protection laws in the US, and sweat-of-the-brow has been rejected as a
  basis for new copyrights.


 You're right, it doesn't give them any *special* rights.  They have the same
 rights as any other computer owner.  Specifically, they have the right to
 choose who uses their computers, and how they use them.  Whether or not a
 terms of service is legally binding is really not the issue. (*)  The issue
 is whether or not they have a duty to make it *convenient* for you to
 download the data.  Of course they don't.  Why should they be required to
 help you put them out of business?  That kind of twisted logic might make
 sense in the non-profit world (although I still haven't seen the WMF step up
 to the plate and make it easy for people to make a full history fork, or
 even to download all the images), but Google is not a non-profit
 organization.  Google would be Evil if it *didn't* protect itself against
 this, as it'd be breaking a promise to its shareholders.

 (*) Personally, I'm of the opinion that merely accessing a website is not
 sufficient to bind a websurfer to a TOS, and that at most a TOS which you do
 not have to even click agree to is a unilateral contract which can only
 impose promises upon the offeror, though this is not a legal opinion but
 merely my opinion of what the law should be.

Whether Google is good or evil is off-topic, and irrelevant to boot.

There are nearly _750,000_ books from Google that are available on
archive.org, available in DJVU format with OCR.

  http://www.archive.org/details/googlebooks

Microsoft donated many texts directly to IA, but that approach only
netted 440,000 books.

  http://www.archive.org/details/msn_books

See here for more of the collections:
   http://www.archive.org/details/texts

Also worth noting, Project Gutenberg has digitised less than 30,000
books since 1971.  Distributed Proofreaders has done 15,000 of those
since 2000, so throughput is picking up.  But, there are more than
enough too keep everyone busy for a very long time.

--
John Vandenberg

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Anthony
On Sun, Jun 21, 2009 at 7:54 AM, John Vandenberg jay...@gmail.com wrote:

 Whether Google is good or evil is off-topic, and irrelevant to boot.


Whether or not they have a right to exclude bots isn't.

Also worth noting, Project Gutenberg has digitised less than 30,000
 books since 1971.  Distributed Proofreaders has done 15,000 of those
 since 2000, so throughput is picking up.  But, there are more than
 enough too keep everyone busy for a very long time.


The interesting thing is, even if you don't use a bot, it's still faster to
copy/paste from Google manually than it is to get the book and scan it in
yourself (assuming you don't want to destroy the original, anyway).

If you're going to make a project out OCRing books that Google has already
OCRed, I don't see any point in reinventing the scanning or first pass
OCRing part.
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread John Vandenberg
On Sun, Jun 21, 2009 at 10:07 PM, Anthony wikim...@inbox.org wrote:

 On Sun, Jun 21, 2009 at 7:54 AM, John Vandenberg jay...@gmail.com wrote:

  Whether Google is good or evil is off-topic, and irrelevant to boot.
 

 Whether or not they have a right to exclude bots isn't.

Actually, it is.  This mailing list is about the Wikimedia Foundation
and its project, and this thread is about Wikisource.  Anyone who has
done significant amounts of Wikisource work will tell you that they
don't consider Google Book click through license to be an problem that
needs discussing at this level.

Do you think that 750,000 Google Books were manually converted to
DJVU, and copied over to Internet Archive?

Is there a book that you seek that isn't available at Internet Archive?

I wrote a GreaseMonkey user script to scrape the text from Google
Books; it is now broken and unmaintained because I no longer need to
take text from Google Books, as the vast majority of the texts I want
are now on Internet Archive, and that is a more productive workflow.

 Also worth noting, Project Gutenberg has digitised less than 30,000
  books since 1971.  Distributed Proofreaders has done 15,000 of those
  since 2000, so throughput is picking up.  But, there are more than
  enough too keep everyone busy for a very long time.


 The interesting thing is, even if you don't use a bot, it's still faster to
 copy/paste from Google manually than it is to get the book and scan it in
 yourself (assuming you don't want to destroy the original, anyway).

No, it is quicker to download the DJVU file from Internet Archive,
upload it to Wikisource, set up a transcription project, and fix the
OCR text there, and copy and paste it wherever you like.

It takes about 10 minutes unless there is some copyright concern.

 If you're going to make a project out OCRing books that Google has already
 OCRed, I don't see any point in reinventing the scanning or first pass
 OCRing part.

I suggest you take a look at a few of the DJVU files provided by
Internet Archive.  Then you can point out real faults that you see.

--
John Vandenberg

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Anthony
On Sun, Jun 21, 2009 at 8:35 AM, John Vandenberg jay...@gmail.com wrote:

 I suggest you take a look at a few of the DJVU files provided by
 Internet Archive.  Then you can point out real faults that you see.


I will.  My apologies for misunderstanding your email.
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Anthony
On Sun, Jun 21, 2009 at 10:23 AM, Anthony wikim...@inbox.org wrote:

 On Sun, Jun 21, 2009 at 8:35 AM, John Vandenberg jay...@gmail.com wrote:

 I suggest you take a look at a few of the DJVU files provided by
 Internet Archive.  Then you can point out real faults that you see.


 I will.  My apologies for misunderstanding your email.


Okay, http://www.archive.org/details/catholicencyclo16herbgoog happened to
be the first book I randomly picked from Google Book Search.  There's no
text version.

And the text version I find of other editions seems to be much much worse
than the google OCR results.
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Anthony
On Sun, Jun 21, 2009 at 10:55 AM, Anthony wikim...@inbox.org wrote:

 On Sun, Jun 21, 2009 at 10:23 AM, Anthony wikim...@inbox.org wrote:

 On Sun, Jun 21, 2009 at 8:35 AM, John Vandenberg jay...@gmail.comwrote:

 I suggest you take a look at a few of the DJVU files provided by
 Internet Archive.  Then you can point out real faults that you see.


 I will.  My apologies for misunderstanding your email.


 Okay, http://www.archive.org/details/catholicencyclo16herbgoog happened to
 be the first book I randomly picked from Google Book Search.  There's no
 text version.

 And the text version I find of other editions seems to be much much worse
 than the google OCR results.


http://books.google.com/books?id=TZ0UYAAJ strike two, not even there.
http://books.google.com/books?id=PYAaYAAJ strike three
http://www.archive.org/details/happinessessays00hiltgoog finally...let's
compare the OCR:

Great numbers of thoughtful people are just now much perplexed to know what
to make of the faffs of life, and are looking about them for some reasonable
interpretation of the modern world. They cannot abandon the work of the
world, but they are conscious that they have not learned the art of work.

Greaf numbers of thoughtful people are just now much perplexed to know what
to make of thefaSls of life^ and are looking about them for some reasonable
interpretation of the modem world. They cannot abandon the work of the
worlds but they are conscious that they have not learned the art of work.
---
Few people, however, really know how to work, and even in an age when
oftener perhaps than ever before we hear of work and workers one cannot
observe that the art of work makes much positive progress. On the contrary,
the general inclination seems to be to work as little as possible, or to
work for a short time in order to pass the remainder of one's life in rest.

Few people, however,  really know how to work, and even in an age when
oftener perhaps than ever before we hear of work  and  workers  one
cannotobserve that the art of work makes much positive progress. On the
contrary, the general inclination seems to be to work as little as possible,
or to work for a short time in order to pass the remainder of one's life in
rest. 
---
I guess that's acceptable.  The Catholic encyclopedia results were much
worse, though.  Maybe it was a font thing, but I'm not quite interested
enough to bother doing a more in depth study right now.
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread John Vandenberg
On Sun, Jun 21, 2009 at 1:41 AM, David Gerard dger...@gmail.com wrote:

 http://blogs.law.harvard.edu/infolaw/2009/06/19/using-wikisource-as-an-alternative-open-access-repository-for-legal-scholarship/

 Interesting. How well does this fit with what Wikisource does?

Here are seven articles from PLoS One.

http://en.wikisource.org/wiki/Category:Plosone

We have other published material that has been released under CC licenses:

http://en.wikisource.org/wiki/Unhappy_Thought

And books under various licenses:

http://en.wikisource.org/wiki/Bulgarian_Policies_on_the_Republic_of_Macedonia
http://en.wikisource.org/wiki/A_Short_History_of_Russian_%22Fantastica%22
http://en.wikisource.org/wiki/Free_as_in_Freedom

--
John Vandenberg

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Ray Saintonge
Anthony wrote:
 On Sun, Jun 21, 2009 at 10:55 AM, Anthony wrote:
   
 Okay, http://www.archive.org/details/catholicencyclo16herbgoog happened to
 be the first book I randomly picked from Google Book Search.  There's no
 text version.

 And the text version I find of other editions seems to be much much worse
 than the google OCR results.
 
 http://books.google.com/books?id=TZ0UYAAJ strike two, not even there.
 http://books.google.com/books?id=PYAaYAAJ strike three
 http://www.archive.org/details/happinessessays00hiltgoog finally...let's
 compare the OCR:

 Great numbers of thoughtful people are just now much perplexed to know what
 to make of the faffs of life, and are looking about them for some reasonable
 interpretation of the modern world. They cannot abandon the work of the
 world, but they are conscious that they have not learned the art of work.

 Greaf numbers of thoughtful people are just now much perplexed to know what
 to make of thefaSls of life^ and are looking about them for some reasonable
 interpretation of the modem world. They cannot abandon the work of the
 worlds but they are conscious that they have not learned the art of work.
 ---
 Few people, however, really know how to work, and even in an age when
 oftener perhaps than ever before we hear of work and workers one cannot
 observe that the art of work makes much positive progress. On the contrary,
 the general inclination seems to be to work as little as possible, or to
 work for a short time in order to pass the remainder of one's life in rest.

 Few people, however,  really know how to work, and even in an age when
 oftener perhaps than ever before we hear of work  and  workers  one
 cannotobserve that the art of work makes much positive progress. On the
 contrary, the general inclination seems to be to work as little as possible,
 or to work for a short time in order to pass the remainder of one's life in
 rest. 
 ---
 I guess that's acceptable.  The Catholic encyclopedia results were much
 worse, though.  Maybe it was a font thing, but I'm not quite interested
 enough to bother doing a more in depth study right now.
.
Who is expecting OCR to be perfect anywhere?  In the absence of real 
human proofreading I assume any OCR material to be fraught with errors. 
Wikisource aims to accurately reproduce what was published, including 
original errors.  Scans alone provide the needed accuracy, but they are 
not suitable for the added value of wikification.

Ec

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


[Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread David Gerard
http://blogs.law.harvard.edu/infolaw/2009/06/19/using-wikisource-as-an-alternative-open-access-repository-for-legal-scholarship/

Interesting. How well does this fit with what Wikisource does?


- d.

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Samuel Klein
There is a wealth of work done all the time by primary source
researchers and publishers, which could be improved on by having
wikisource entries, translations, c.

Related question : how appropriate would large numbers of public
domain texts, with page scans and the best available OCR [and
translations of same], fit with what Wikisource does now?  This is
clearly a wiki project that needs to happen : OCR even at its best
misses rare meaning-bearing words.   If not Wikisource, where should
this work take place?

SJ

On Sat, Jun 20, 2009 at 11:41 AM, David Gerarddger...@gmail.com wrote:
 http://blogs.law.harvard.edu/infolaw/2009/06/19/using-wikisource-as-an-alternative-open-access-repository-for-legal-scholarship/

 Interesting. How well does this fit with what Wikisource does?


 - d.

 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l



___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Brian
This has reminded me to complain about Google Books. Google has the world's
best OCR (in virtue of having the largest OCR'able dataset) and also has a
mission to scan in all the public domain books they can get their hand on.
They recently updated their interface to, as they put it, make it easier to
find our plain text versions of public domain books. If a book is available
in full view, you can click the 'Plain text' button in the toolbar.
Unfortunately the only way I've found to download the full text of a public
domain book from Google is to flip through the book a page at a time,
copying the text to your clipboard.
There are roughly 2-3 million public domain books in Google Books.


On Sat, Jun 20, 2009 at 10:10 AM, Samuel Klein meta...@gmail.com wrote:

 There is a wealth of work done all the time by primary source
 researchers and publishers, which could be improved on by having
 wikisource entries, translations, c.

 Related question : how appropriate would large numbers of public
 domain texts, with page scans and the best available OCR [and
 translations of same], fit with what Wikisource does now?  This is
 clearly a wiki project that needs to happen : OCR even at its best
 misses rare meaning-bearing words.   If not Wikisource, where should
 this work take place?

 SJ

 On Sat, Jun 20, 2009 at 11:41 AM, David Gerarddger...@gmail.com wrote:
 
 http://blogs.law.harvard.edu/infolaw/2009/06/19/using-wikisource-as-an-alternative-open-access-repository-for-legal-scholarship/
 
  Interesting. How well does this fit with what Wikisource does?
 
 
  - d.
 
  ___
  foundation-l mailing list
  foundation-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
 
 

 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Platonides
Brian wrote:
 Unfortunately the only way I've found to download the full text of a public
 domain book from Google is to flip through the book a page at a time,
 copying the text to your clipboard.
 There are roughly 2-3 million public domain books in Google Books.

That's easy to fix :)


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Brian
Not likely. I've been banned from Google's regular search at least a dozen
times during semi-frenetic search sprees in which I was identified as a bot.
There is no doubt that if you try to automate it you will be quickly shot
down.

On Sat, Jun 20, 2009 at 12:02 PM, Platonides platoni...@gmail.com wrote:

 Brian wrote:
  Unfortunately the only way I've found to download the full text of a
 public
  domain book from Google is to flip through the book a page at a time,
  copying the text to your clipboard.
  There are roughly 2-3 million public domain books in Google Books.

 That's easy to fix :)


 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Anthony
Easier than scanning, though :)

On Sat, Jun 20, 2009 at 2:04 PM, Brian brian.min...@colorado.edu wrote:

 Not likely. I've been banned from Google's regular search at least a dozen
 times during semi-frenetic search sprees in which I was identified as a
 bot.
 There is no doubt that if you try to automate it you will be quickly shot
 down.

 On Sat, Jun 20, 2009 at 12:02 PM, Platonides platoni...@gmail.com wrote:

  Brian wrote:
   Unfortunately the only way I've found to download the full text of a
  public
   domain book from Google is to flip through the book a page at a time,
   copying the text to your clipboard.
   There are roughly 2-3 million public domain books in Google Books.
 
  That's easy to fix :)
 
 
  ___
  foundation-l mailing list
  foundation-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
 
 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Falcorian
So the bot just has to run at human speeds so it does not get banned, it
still won't get tired or make unpredictable mistakes. And you can run it
from different IPs to parallelize.

--Falcorian

On Sat, Jun 20, 2009 at 11:04 AM, Brian brian.min...@colorado.edu wrote:

 Not likely. I've been banned from Google's regular search at least a dozen
 times during semi-frenetic search sprees in which I was identified as a
 bot.
 There is no doubt that if you try to automate it you will be quickly shot
 down.

 On Sat, Jun 20, 2009 at 12:02 PM, Platonides platoni...@gmail.com wrote:

  Brian wrote:
   Unfortunately the only way I've found to download the full text of a
  public
   domain book from Google is to flip through the book a page at a time,
   copying the text to your clipboard.
   There are roughly 2-3 million public domain books in Google Books.
 
  That's easy to fix :)
 
 
  ___
  foundation-l mailing list
  foundation-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
 
 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Brian
That is against the law. It violates Google's ToS.

I'm mostly complaining that Google is being Very Evil. There is nothing we
can do about it except complain to them. Which I don't know how to do - they
apparently believe that the plain text versions of their books are akin to
their intellectual property and are unwilling to give them away.

On Sat, Jun 20, 2009 at 12:34 PM, Falcorian 
alex.public.account+wikimediamailingl...@gmail.comalex.public.account%2bwikimediamailingl...@gmail.com
 wrote:

 So the bot just has to run at human speeds so it does not get banned, it
 still won't get tired or make unpredictable mistakes. And you can run it
 from different IPs to parallelize.

 --Falcorian

 On Sat, Jun 20, 2009 at 11:04 AM, Brian brian.min...@colorado.edu wrote:

  Not likely. I've been banned from Google's regular search at least a
 dozen
  times during semi-frenetic search sprees in which I was identified as a
  bot.
  There is no doubt that if you try to automate it you will be quickly shot
  down.
 
  On Sat, Jun 20, 2009 at 12:02 PM, Platonides platoni...@gmail.com
 wrote:
 
   Brian wrote:
Unfortunately the only way I've found to download the full text of a
   public
domain book from Google is to flip through the book a page at a time,
copying the text to your clipboard.
There are roughly 2-3 million public domain books in Google Books.
  
   That's easy to fix :)
  
  
   ___
   foundation-l mailing list
   foundation-l@lists.wikimedia.org
   Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
  
  ___
  foundation-l mailing list
  foundation-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
 
 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Geoffrey Plourde
For some reason, I am reminded of a Supreme Court case about the information in 
telephone directories. Maybe because of the insanity of trying to put public 
domain material under copyright. 





From: Brian brian.min...@colorado.edu
To: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org
Sent: Saturday, June 20, 2009 11:47:28 AM
Subject: Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative 
Open Access Repository for Legal Scholarship

That is against the law. It violates Google's ToS.

I'm mostly complaining that Google is being Very Evil. There is nothing we
can do about it except complain to them. Which I don't know how to do - they
apparently believe that the plain text versions of their books are akin to
their intellectual property and are unwilling to give them away.

On Sat, Jun 20, 2009 at 12:34 PM, Falcorian 
alex.public.account+wikimediamailingl...@gmail.comalex.public.account%2bwikimediamailingl...@gmail.com
 wrote:

 So the bot just has to run at human speeds so it does not get banned, it
 still won't get tired or make unpredictable mistakes. And you can run it
 from different IPs to parallelize.

 --Falcorian

 On Sat, Jun 20, 2009 at 11:04 AM, Brian brian.min...@colorado.edu wrote:

  Not likely. I've been banned from Google's regular search at least a
 dozen
  times during semi-frenetic search sprees in which I was identified as a
  bot.
  There is no doubt that if you try to automate it you will be quickly shot
  down.
 
  On Sat, Jun 20, 2009 at 12:02 PM, Platonides platoni...@gmail.com
 wrote:
 
   Brian wrote:
Unfortunately the only way I've found to download the full text of a
   public
domain book from Google is to flip through the book a page at a time,
copying the text to your clipboard.
There are roughly 2-3 million public domain books in Google Books.
  
   That's easy to fix :)
  
  
   ___
   foundation-l mailing list
   foundation-l@lists.wikimedia.org
   Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
  
  ___
  foundation-l mailing list
  foundation-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
 
 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l



  
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Geoffrey Plourde
For Supreme Court cases, would it be possible to have a bot pull the audio 
decisions from Oyez, and convert them into text?





From: David Gerard dger...@gmail.com
To: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org
Sent: Saturday, June 20, 2009 8:41:45 AM
Subject: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open 
Access Repository for Legal Scholarship

http://blogs.law.harvard.edu/infolaw/2009/06/19/using-wikisource-as-an-alternative-open-access-repository-for-legal-scholarship/

Interesting. How well does this fit with what Wikisource does?


- d.

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l



  
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Parker Higgins
Except google isn't asserting any kind of copyright control over these
books, they're just not making it convenient to download them in your
preferred format.  Maybe not The Right Thing, but not as boneheaded as suing
a party who reprints public domain material, as was the case in Feist v.
Rural (the supreme court case you mention.)

Sent from my portable e-mail unit

On Jun 20, 2009 3:23 PM, Geoffrey Plourde geo.p...@yahoo.com wrote:

For some reason, I am reminded of a Supreme Court case about the information
in telephone directories. Maybe because of the insanity of trying to put
public domain material under copyright.





From: Brian brian.min...@colorado.edu
To: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org
Sent: Saturday, June 20, 2009 11:47:28 AM
Subject: Re: [Foundation-l] Info/Law blog: Using Wikisource as an
Alternative Open Access Repository for Legal Scholarship

That is against the law. It violates Google's ToS. I'm mostly complaining
that Google is being Ver...
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Anthony
Wow, what's Wikipedia's policy about using a bot to scrape everything?

On Sat, Jun 20, 2009 at 2:47 PM, Brian brian.min...@colorado.edu wrote:

 That is against the law. It violates Google's ToS.

 I'm mostly complaining that Google is being Very Evil. There is nothing we
 can do about it except complain to them. Which I don't know how to do -
 they
 apparently believe that the plain text versions of their books are akin to
 their intellectual property and are unwilling to give them away.

 On Sat, Jun 20, 2009 at 12:34 PM, Falcorian 
 alex.public.account+wikimediamailingl...@gmail.comalex.public.account%2bwikimediamailingl...@gmail.com
 alex.public.account%2bwikimediamailingl...@gmail.comalex.public.account%252bwikimediamailingl...@gmail.com
 
  wrote:

  So the bot just has to run at human speeds so it does not get banned, it
  still won't get tired or make unpredictable mistakes. And you can run it
  from different IPs to parallelize.
 
  --Falcorian
 
  On Sat, Jun 20, 2009 at 11:04 AM, Brian brian.min...@colorado.edu
 wrote:
 
   Not likely. I've been banned from Google's regular search at least a
  dozen
   times during semi-frenetic search sprees in which I was identified as a
   bot.
   There is no doubt that if you try to automate it you will be quickly
 shot
   down.
  
   On Sat, Jun 20, 2009 at 12:02 PM, Platonides platoni...@gmail.com
  wrote:
  
Brian wrote:
 Unfortunately the only way I've found to download the full text of
 a
public
 domain book from Google is to flip through the book a page at a
 time,
 copying the text to your clipboard.
 There are roughly 2-3 million public domain books in Google Books.
   
That's easy to fix :)
   
   
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe:
 https://lists.wikimedia.org/mailman/listinfo/foundation-l
   
   ___
   foundation-l mailing list
   foundation-l@lists.wikimedia.org
   Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
  
  ___
  foundation-l mailing list
  foundation-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
 
 ___
 foundation-l mailing list
 foundation-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Brian
On Sat, Jun 20, 2009 at 1:29 PM, Platonides platoni...@gmail.com wrote:

 Where does it forbid them?


5.3 You agree not to access (or attempt to access) any of the Services by
any means other than through the interface that is provided by Google,
unless you have been specifically allowed to do so in a separate agreement
with Google. You specifically agree not to access (or attempt to access) any
of the Services through any automated means (including use of scripts or web
crawlers) and shall ensure that you comply with the instructions set out in
any robots.txt file present on the Services.
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Geoffrey Plourde
If a bot has a meaningful effect on server load (i.e. page requests), it falls 
under the category of malicious software, which is highly illegal.





From: Ray Saintonge sainto...@telus.net
To: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org
Sent: Saturday, June 20, 2009 2:35:52 PM
Subject: Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative 
Open Access Repository for Legal Scholarship

Brian wrote:
 That is against the law. It violates Google's ToS.

 I'm mostly complaining that Google is being Very Evil. There is nothing we
 can do about it except complain to them. Which I don't know how to do - they
 apparently believe that the plain text versions of their books are akin to
 their intellectual property and are unwilling to give them away.

  
How is violating Google's ToS against the law?  Sites put all sorts of 
meaningless garbage into these documents, and users mostly ignore them.

Of course Google's evil; it's about time that people noticed that.  They 
use their deep pockets as a way to bully other sites ... with a smile. 
Fortunately the U.S. does not have database protection laws like the 
E.U.  Ideally, every PD item they host should also be hosted on an 
alternative site, but that's a massive undertaking, ... and they know 
it.  Nothing requires them to be nice to the competition, such as by 
making it easy to copy their material.

Ec

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l



  
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Ray Saintonge
Anthony wrote:
 Wow, what's Wikipedia's policy about using a bot to scrape everything?
   

I don't know about any policy, but I think it should still be 
discouraged.  For me this has less to do with predation on other sites 
than with our inability to keep up with the volume of data that would be 
produced.  Proofreading and wikifying are labour-intensive processes.  
It is very easy for the technically minded to bring the scan and OCR of 
a 500-page book under our roof, but without the manpower to bring the 
added value these processes are scarcely better than data dumps.

Ec
 On Sat, Jun 20, 2009 at 2:47 PM, Brian brian.min...@colorado.edu wrote:
   
 That is against the law. It violates Google's ToS.

 I'm mostly complaining that Google is being Very Evil. There is nothing we
 can do about it except complain to them. Which I don't know how to do -
 they
 apparently believe that the plain text versions of their books are akin to
 their intellectual property and are unwilling to give them away.

 On Sat, Jun 20, 2009 at 12:34 PM, Falcorian wrote:
 
 So the bot just has to run at human speeds so it does not get banned, it
 still won't get tired or make unpredictable mistakes. And you can run it
 from different IPs to parallelize.

 --Falcorian


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Ray Saintonge
Geoffrey Plourde wrote:
 If a bot has a meaningful effect on server load (i.e. page requests), it 
 falls under the category of malicious software, which is highly illegal.
   
Malicious software or overloading servers goes well beyond ignoring a 
ToS.  Why should downloading whole books from Google have any greater 
effect on server load than downloading a whole book of similar length 
from Internet Archive?

Ec


 
 From: Ray Saintonge 


 Brian wrote:
   
 That is against the law. It violates Google's ToS.

 I'm mostly complaining that Google is being Very Evil. There is nothing we
 can do about it except complain to them. Which I don't know how to do - they
 apparently believe that the plain text versions of their books are akin to
 their intellectual property and are unwilling to give them away.

  
 
 How is violating Google's ToS against the law?  Sites put all sorts of 
 meaningless garbage into these documents, and users mostly ignore them.

 Of course Google's evil; it's about time that people noticed that.  They 
 use their deep pockets as a way to bully other sites ... with a smile. 
 Fortunately the U.S. does not have database protection laws like the 
 E.U.  Ideally, every PD item they host should also be hosted on an 
 alternative site, but that's a massive undertaking, ... and they know 
 it.  Nothing requires them to be nice to the competition, such as by 
 making it easy to copy their material.

 Ec
   


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Stephen Bain
On Sun, Jun 21, 2009 at 5:27 AM, Parker Higginsparkerhigg...@gmail.com wrote:
 Except google isn't asserting any kind of copyright control over these
 books, they're just not making it convenient to download them in your
 preferred format.  Maybe not The Right Thing, but not as boneheaded as suing
 a party who reprints public domain material, as was the case in Feist v.
 Rural (the supreme court case you mention.)

They want people to use their service. Fair enough, given that the
scanning and OCRing happened on their dime.

-- 
Stephen Bain
stephen.b...@gmail.com

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Geoffrey Plourde
A bot or bots calling up massive amounts of data at high speed can have a 
negative effect on a server. While I doubt the bot we use would have the power 
to take down a Google server, the speed of the requests and the constant number 
of requests will definitely be noticeable, possibly leading to unpleasant 
consequences. 





From: Ray Saintonge sainto...@telus.net
To: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org
Sent: Saturday, June 20, 2009 5:07:44 PM
Subject: Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative 
Open Access Repository for Legal Scholarship

Geoffrey Plourde wrote:
 If a bot has a meaningful effect on server load (i.e. page requests), it 
 falls under the category of malicious software, which is highly illegal.
  
Malicious software or overloading servers goes well beyond ignoring a 
ToS.  Why should downloading whole books from Google have any greater 
effect on server load than downloading a whole book of similar length 
from Internet Archive?

Ec


 
 From: Ray Saintonge 


 Brian wrote:
  
 That is against the law. It violates Google's ToS.

 I'm mostly complaining that Google is being Very Evil. There is nothing we
 can do about it except complain to them. Which I don't know how to do - they
 apparently believe that the plain text versions of their books are akin to
 their intellectual property and are unwilling to give them away.

  

 How is violating Google's ToS against the law?  Sites put all sorts of 
 meaningless garbage into these documents, and users mostly ignore them.

 Of course Google's evil; it's about time that people noticed that.  They 
 use their deep pockets as a way to bully other sites ... with a smile. 
 Fortunately the U.S. does not have database protection laws like the 
 E.U.  Ideally, every PD item they host should also be hosted on an 
 alternative site, but that's a massive undertaking, ... and they know 
 it.  Nothing requires them to be nice to the competition, such as by 
 making it easy to copy their material.

 Ec
  


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l



  
___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Ray Saintonge
Geoffrey Plourde wrote:
 A bot or bots calling up massive amounts of data at high speed can have a 
 negative effect on a server. While I doubt the bot we use would have the 
 power to take down a Google server, the speed of the requests and the 
 constant number of requests will definitely be noticeable, possibly leading 
 to unpleasant consequences. 
   
And data accumulation at such a high speed would also be more than could 
be properly handled at the Wikisource end as well.  We regularly get 
whole works from Internet Archive and other sources, without any such 
problems arising.  I would not reasonably expect a greater accumulation 
rate from Google.

Ec

 _
 From: Ray Saintonge sainto...@telus.net


 Geoffrey Plourde wrote:
   
 If a bot has a meaningful effect on server load (i.e. page requests), it 
 falls under the category of malicious software, which is highly illegal.
  
 
 Malicious software or overloading servers goes well beyond ignoring a 
 ToS.  Why should downloading whole books from Google have any greater 
 effect on server load than downloading a whole book of similar length 
 from Internet Archive?

 Ec


   
 
 From: Ray Saintonge 


 Brian wrote:
  
 
 That is against the law. It violates Google's ToS.

 I'm mostly complaining that Google is being Very Evil. There is nothing we
 can do about it except complain to them. Which I don't know how to do - they
 apparently believe that the plain text versions of their books are akin to
 their intellectual property and are unwilling to give them away.

  

   
 How is violating Google's ToS against the law?  Sites put all sorts of 
 meaningless garbage into these documents, and users mostly ignore them.

 Of course Google's evil; it's about time that people noticed that.  They 
 use their deep pockets as a way to bully other sites ... with a smile. 
 Fortunately the U.S. does not have database protection laws like the 
 E.U.  Ideally, every PD item they host should also be hosted on an 
 alternative site, but that's a massive undertaking, ... and they know 
 it.  Nothing requires them to be nice to the competition, such as by 
 making it easy to copy their material.

 Ec
  
 

   


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Ray Saintonge
Stephen Bain wrote:
 On Sun, Jun 21, 2009 at 5:27 AM, Parker Higginsparkerhigg...@gmail.com 
 wrote:
   
 Except google isn't asserting any kind of copyright control over these
 books, they're just not making it convenient to download them in your
 preferred format.  Maybe not The Right Thing, but not as boneheaded as suing
 a party who reprints public domain material, as was the case in Feist v.
 Rural (the supreme court case you mention.)
 
 They want people to use their service. Fair enough, given that the
 scanning and OCRing happened on their dime.

   
How does that give them any special rights?  There are no database 
protection laws in the US, and sweat-of-the-brow has been rejected as a 
basis for new copyrights.

Ec


___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l