Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread Peter Gervai
On Tue, Jun 23, 2009 at 03:15, Platonidesplatoni...@gmail.com wrote: Although not trivial, downloading all images is in fact quite easy. You can find scripts to do that already made. You can also ask Brion to rsync3 them. But do you have enough space to dedicate? How many wikis do you want to

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread Samuel Klein
Yes, but my understanding is that while google provided part of the mbp data and scans, its continued updates to ocr since then are not being shared. I would be glad to learn this was not the case... samuel klein. s...@laptop.org. +1 617 529 4266 On Jun 21, 2009 3:14 AM, Nikola Smolenski

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread Anthony
On Mon, Jun 22, 2009 at 9:15 PM, Platonides platoni...@gmail.com wrote: Anthony wrote: (although I still haven't seen the WMF step up to the plate and make it easy for people to make a full history fork, or even to download all the images) You'll find full history dumps of almost all

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread Brian
2009/6/23 Samuel Klein meta...@gmail.com Yes, but my understanding is that while google provided part of the mbp data and scans, its continued updates to ocr since then are not being shared. I would be glad to learn this was not the case... The dataset you need to train an OCR system to be

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread Michael Snow
Brian wrote: 2009/6/23 Samuel Klein meta...@gmail.com Yes, but my understanding is that while google provided part of the mbp data and scans, its continued updates to ocr since then are not being shared. I would be glad to learn this was not the case... The dataset you need to

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread Brian
On Tue, Jun 23, 2009 at 11:44 AM, Michael Snow wikipe...@verizon.netwrote: The dataset you need to train an OCR system to be as good as theirs is the raw images and the plain text. They aren't making it easy to get either of those things :( They have presumably improved the software in

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread Michael Snow
Brian wrote: On Tue, Jun 23, 2009 at 11:44 AM, Michael Snow wikipe...@verizon.netwrote: The dataset you need to train an OCR system to be as good as theirs is the raw images and the plain text. They aren't making it easy to get either of those things :( They

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread Brian
Ok Shakespeare. But in plain english you appear to be saying that corporations are inherently greedy and have a tendency to be evil. Sure, but we expect more out of GOOG. This is not MSFT we are talking about. On Tue, Jun 23, 2009 at 12:13 PM, Michael Snow wikipe...@verizon.netwrote: Brian

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread Anthony
On Tue, Jun 23, 2009 at 1:09 PM, Brian brian.min...@colorado.edu wrote: 2009/6/23 Samuel Klein meta...@gmail.com Yes, but my understanding is that while google provided part of the mbp data and scans, its continued updates to ocr since then are not being shared. I would be glad to

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread Anthony
On Tue, Jun 23, 2009 at 2:24 PM, Brian brian.min...@colorado.edu wrote: Ok Shakespeare. But in plain english you appear to be saying that corporations are inherently greedy and have a tendency to be evil. Sure, but we expect more out of GOOG. This is not MSFT we are talking about. Of course

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread Anthony
On Tue, Jun 23, 2009 at 3:58 PM, Anthony wikim...@inbox.org wrote: On Tue, Jun 23, 2009 at 2:24 PM, Brian brian.min...@colorado.edu wrote: Ok Shakespeare. But in plain english you appear to be saying that corporations are inherently greedy and have a tendency to be evil. Sure, but we expect

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-23 Thread John Vandenberg
On Wed, Jun 24, 2009 at 6:10 AM, Anthony wikim...@inbox.org wrote: On Tue, Jun 23, 2009 at 3:58 PM, Anthony wikim...@inbox.org wrote: On Tue, Jun 23, 2009 at 2:24 PM, Brian brian.min...@colorado.edu wrote: Ok Shakespeare. But in plain english you appear to be saying that corporations

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-22 Thread Platonides
Anthony wrote: On Sun, Jun 21, 2009 at 7:54 AM, John Vandenberg jay...@gmail.com wrote: Whether Google is good or evil is off-topic, and irrelevant to boot. Whether or not they have a right to exclude bots isn't. Also worth noting, Project Gutenberg has digitised less than 30,000 books

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-22 Thread Mark Wagner
On Sat, Jun 20, 2009 at 14:35, Ray Saintongesainto...@telus.net wrote: Brian wrote: That is against the law. It violates Google's ToS. I'm mostly complaining that Google is being Very Evil. There is nothing we can do about it except complain to them. Which I don't know how to do - they

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-22 Thread Dan Rosenthal
The statute supports that as well, providing a private right of action and civil remedy. It's not entirely that cut and dry (there are certain restrictions that must be met) but yeah, it appears that in some cases TOS violations can be illegal. -Dan On Jun 22, 2009, at 7:49 PM, Mark Wagner

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-22 Thread Platonides
Anthony wrote: (although I still haven't seen the WMF step up to the plate and make it easy for people to make a full history fork, or even to download all the images) You'll find full history dumps of almost all wikis at http://download.wikimedia.org/ Although not trivial, downloading all

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Ray Saintonge
Samuel Klein wrote: There is a wealth of work done all the time by primary source researchers and publishers, which could be improved on by having wikisource entries, translations, c. Related question : how appropriate would large numbers of public domain texts, with page scans and the best

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Nikola Smolenski
Дана Saturday 20 June 2009 18:29:24 Brian написа: This has reminded me to complain about Google Books. Google has the world's best OCR (in virtue of having the largest OCR'able dataset) and also has a mission to scan in all the public domain books they can get their hand on. They recently

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread John Vandenberg
On Sun, Jun 21, 2009 at 1:41 AM, David Gerard dger...@gmail.com wrote: http://blogs.law.harvard.edu/infolaw/2009/06/19/using-wikisource-as-an-alternative-open-access-repository-for-legal-scholarship/ Interesting. How well does this fit with what Wikisource does? Tim Armstrong is a sysop on

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Anthony
On Sun, Jun 21, 2009 at 1:51 AM, Ray Saintonge sainto...@telus.net wrote: Stephen Bain wrote: On Sun, Jun 21, 2009 at 5:27 AM, Parker Higginsparkerhigg...@gmail.com wrote: Except google isn't asserting any kind of copyright control over these books, they're just not making it convenient

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Anthony
On Sun, Jun 21, 2009 at 7:17 AM, Anthony wikim...@inbox.org wrote: (*) Personally, I'm of the opinion that merely accessing a website is not sufficient to bind a websurfer to a TOS, and that at most a TOS which you do not have to even click agree to is a unilateral contract which can only

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread John Vandenberg
On Sun, Jun 21, 2009 at 9:17 PM, Anthony wikim...@inbox.org wrote: On Sun, Jun 21, 2009 at 1:51 AM, Ray Saintonge sainto...@telus.net wrote: Stephen Bain wrote: On Sun, Jun 21, 2009 at 5:27 AM, Parker Higginsparkerhigg...@gmail.com wrote: Except google isn't asserting any kind of

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Anthony
On Sun, Jun 21, 2009 at 7:54 AM, John Vandenberg jay...@gmail.com wrote: Whether Google is good or evil is off-topic, and irrelevant to boot. Whether or not they have a right to exclude bots isn't. Also worth noting, Project Gutenberg has digitised less than 30,000 books since 1971.

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread John Vandenberg
On Sun, Jun 21, 2009 at 10:07 PM, Anthony wikim...@inbox.org wrote: On Sun, Jun 21, 2009 at 7:54 AM, John Vandenberg jay...@gmail.com wrote: Whether Google is good or evil is off-topic, and irrelevant to boot. Whether or not they have a right to exclude bots isn't. Actually, it is. This

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Anthony
On Sun, Jun 21, 2009 at 8:35 AM, John Vandenberg jay...@gmail.com wrote: I suggest you take a look at a few of the DJVU files provided by Internet Archive. Then you can point out real faults that you see. I will. My apologies for misunderstanding your email.

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Anthony
On Sun, Jun 21, 2009 at 10:23 AM, Anthony wikim...@inbox.org wrote: On Sun, Jun 21, 2009 at 8:35 AM, John Vandenberg jay...@gmail.com wrote: I suggest you take a look at a few of the DJVU files provided by Internet Archive. Then you can point out real faults that you see. I will. My

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Anthony
On Sun, Jun 21, 2009 at 10:55 AM, Anthony wikim...@inbox.org wrote: On Sun, Jun 21, 2009 at 10:23 AM, Anthony wikim...@inbox.org wrote: On Sun, Jun 21, 2009 at 8:35 AM, John Vandenberg jay...@gmail.comwrote: I suggest you take a look at a few of the DJVU files provided by Internet Archive.

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread John Vandenberg
On Sun, Jun 21, 2009 at 1:41 AM, David Gerard dger...@gmail.com wrote: http://blogs.law.harvard.edu/infolaw/2009/06/19/using-wikisource-as-an-alternative-open-access-repository-for-legal-scholarship/ Interesting. How well does this fit with what Wikisource does? Here are seven articles from

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-21 Thread Ray Saintonge
Anthony wrote: On Sun, Jun 21, 2009 at 10:55 AM, Anthony wrote: Okay, http://www.archive.org/details/catholicencyclo16herbgoog happened to be the first book I randomly picked from Google Book Search. There's no text version. And the text version I find of other editions seems to be much

[Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread David Gerard
http://blogs.law.harvard.edu/infolaw/2009/06/19/using-wikisource-as-an-alternative-open-access-repository-for-legal-scholarship/ Interesting. How well does this fit with what Wikisource does? - d. ___ foundation-l mailing list

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Samuel Klein
There is a wealth of work done all the time by primary source researchers and publishers, which could be improved on by having wikisource entries, translations, c. Related question : how appropriate would large numbers of public domain texts, with page scans and the best available OCR [and

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Brian
This has reminded me to complain about Google Books. Google has the world's best OCR (in virtue of having the largest OCR'able dataset) and also has a mission to scan in all the public domain books they can get their hand on. They recently updated their interface to, as they put it, make it easier

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Platonides
Brian wrote: Unfortunately the only way I've found to download the full text of a public domain book from Google is to flip through the book a page at a time, copying the text to your clipboard. There are roughly 2-3 million public domain books in Google Books. That's easy to fix :)

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Brian
Not likely. I've been banned from Google's regular search at least a dozen times during semi-frenetic search sprees in which I was identified as a bot. There is no doubt that if you try to automate it you will be quickly shot down. On Sat, Jun 20, 2009 at 12:02 PM, Platonides platoni...@gmail.com

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Anthony
Easier than scanning, though :) On Sat, Jun 20, 2009 at 2:04 PM, Brian brian.min...@colorado.edu wrote: Not likely. I've been banned from Google's regular search at least a dozen times during semi-frenetic search sprees in which I was identified as a bot. There is no doubt that if you try to

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Falcorian
So the bot just has to run at human speeds so it does not get banned, it still won't get tired or make unpredictable mistakes. And you can run it from different IPs to parallelize. --Falcorian On Sat, Jun 20, 2009 at 11:04 AM, Brian brian.min...@colorado.edu wrote: Not likely. I've been banned

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Brian
That is against the law. It violates Google's ToS. I'm mostly complaining that Google is being Very Evil. There is nothing we can do about it except complain to them. Which I don't know how to do - they apparently believe that the plain text versions of their books are akin to their intellectual

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Geoffrey Plourde
Mailing List foundation-l@lists.wikimedia.org Sent: Saturday, June 20, 2009 11:47:28 AM Subject: Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship That is against the law. It violates Google's ToS. I'm mostly complaining that Google

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Geoffrey Plourde
, 2009 8:41:45 AM Subject: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship http://blogs.law.harvard.edu/infolaw/2009/06/19/using-wikisource-as-an-alternative-open-access-repository-for-legal-scholarship/ Interesting. How well does this fit

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Parker Higgins
domain material under copyright. From: Brian brian.min...@colorado.edu To: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org Sent: Saturday, June 20, 2009 11:47:28 AM Subject: Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Anthony
Wow, what's Wikipedia's policy about using a bot to scrape everything? On Sat, Jun 20, 2009 at 2:47 PM, Brian brian.min...@colorado.edu wrote: That is against the law. It violates Google's ToS. I'm mostly complaining that Google is being Very Evil. There is nothing we can do about it except

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Brian
On Sat, Jun 20, 2009 at 1:29 PM, Platonides platoni...@gmail.com wrote: Where does it forbid them? 5.3 You agree not to access (or attempt to access) any of the Services by any means other than through the interface that is provided by Google, unless you have been specifically allowed to do so

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Geoffrey Plourde
Sent: Saturday, June 20, 2009 2:35:52 PM Subject: Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship Brian wrote: That is against the law. It violates Google's ToS. I'm mostly complaining that Google is being Very Evil

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Ray Saintonge
Anthony wrote: Wow, what's Wikipedia's policy about using a bot to scrape everything? I don't know about any policy, but I think it should still be discouraged. For me this has less to do with predation on other sites than with our inability to keep up with the volume of data that would

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Ray Saintonge
Geoffrey Plourde wrote: If a bot has a meaningful effect on server load (i.e. page requests), it falls under the category of malicious software, which is highly illegal. Malicious software or overloading servers goes well beyond ignoring a ToS. Why should downloading whole books from

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Stephen Bain
On Sun, Jun 21, 2009 at 5:27 AM, Parker Higginsparkerhigg...@gmail.com wrote: Except google isn't asserting any kind of copyright control over these books, they're just not making it convenient to download them in your preferred format.  Maybe not The Right Thing, but not as boneheaded as suing

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Geoffrey Plourde
to unpleasant consequences. From: Ray Saintonge sainto...@telus.net To: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org Sent: Saturday, June 20, 2009 5:07:44 PM Subject: Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Ray Saintonge
Geoffrey Plourde wrote: A bot or bots calling up massive amounts of data at high speed can have a negative effect on a server. While I doubt the bot we use would have the power to take down a Google server, the speed of the requests and the constant number of requests will definitely be

Re: [Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

2009-06-20 Thread Ray Saintonge
Stephen Bain wrote: On Sun, Jun 21, 2009 at 5:27 AM, Parker Higginsparkerhigg...@gmail.com wrote: Except google isn't asserting any kind of copyright control over these books, they're just not making it convenient to download them in your preferred format. Maybe not The Right Thing, but