Re: [CODE4LIB] screen scraping

2011-10-03 Thread Simon Spero
On Oct 3, 2011 9:19 AM, Ed Summers e...@pobox.com wrote:

 On Sun, Oct 2, 2011 at 10:32 PM, Ken Irwin kir...@wittenberg.edu wrote:
  1. respect robots.txt

Disclaimer: I am not a lawyer.

Remember that robots.txt applies only to recursive web crawlers, and not to
screen-scraping per se. In cases where it does apply, it has limited legal
effect, but ignoring it is not cricket.

Important considerations are: is access to the site governed by a license
that prohibits the activity; is the content being scraped subject to
copyright, and if so, is the screen scraping covered by one of the
exceptions to exclusive rights of the copyright holder; is the
screen-scraping activity disruptive and damaging to the site being used
(trespass to chattels, etc.)?

A bit of reflection on the Golden Rule probably is probably more important
than pondering the legality of what you are doing.

Ed invoking philosophy? With citation? (wikipedia still counts) :-p

The usual objection to the golden rule apply here- just because one has no
objection to having a screen scraper used on your own site doesn't
automatically imply that others might not wish to have their sites scraped.

Simon


Re: [CODE4LIB] screen scraping

2011-10-03 Thread Nate Vack
On Sun, Oct 2, 2011 at 9:35 PM, Reese, Terry
terry.re...@oregonstate.edu wrote:
 In Canada, the BC Supreme Court ruled that screen scrapping real estate 
 listings from one site and using them on another indeed infringed on 
 copyright.  Not sure if this would cover your use -- but if you are coming 
 from Canada, it might be something to consider.

 Decision URL: 
 http://www.canlii.org/en/bc/bcsc/doc/2011/2011bcsc1196/2011bcsc1196.html

If you read the decision, it looks as though the content found to be
infringing was the property's description and photograph, which are
creative works.

Indexing factual data about a property *only* (asking price, address,
square footage, etc) may have been on stronger legal footing.

Regards,
-Nate


Re: [CODE4LIB] screen scraping

2011-10-03 Thread Genny Engel
Another reason to check with the webmaster, all legalities aside, is that their 
top ten list might actually be being built on an RSS feed, but for whatever 
reason they don't offer it directly as a feed (or they do, but it wasn't 
obvious to you where that feed was to be found).  They might prefer you grab 
the feed rather than scrape the screen.  I don't actually have any feed-based 
pages on our site that aren't also available as feeds -- but some people might. 
 Also, for usage statistics reasons, I'd rather have bots hitting the feeds 
instead of the pages.

Genny Engel
Sonoma County Library
gen...@sonoma.lib.ca.us
707 545-0831 x581
www.sonomalibrary.org


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Nate 
Hill
Sent: Sunday, October 02, 2011 7:23 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] screen scraping

A question: what are the 'rules' around screen scraping?
If one site doesn't offer an RSS feed and you want to grab (for example)
their weekly top ten list with a script and then redisplay it on another
site, is that bad form?  Or even illegal?
Thanks-
Nate


-- 
Nate Hill
nathanielh...@gmail.com
http://www.natehill.net


Re: [CODE4LIB] screen scraping

2011-10-02 Thread Ken Irwin
I don't know that there are two many rules about this, but here's what comes to 
mind for me:

1. respect robots.txt
2. cache content so you don't hit their site more often than is reasonable. 
(i'd say that once a day is pretty reasonable)
3. also cache or mockup or something when you're writing your code, so you're 
not pounding them with live hits while you're working out the bugs.

as far as legality, i'm gonna leave that to someone else. citation is, of 
course, a really good start.

Ken


On Sun, Oct 2, 2011 at 22:23, Nate Hill nathanielh...@gmail.com wrote:
 A question: what are the 'rules' around screen scraping?
 If one site doesn't offer an RSS feed and you want to grab (for example)
 their weekly top ten list with a script and then redisplay it on another
 site, is that bad form?  Or even illegal?
 Thanks-
 Nate


 --
 Nate Hill
 nathanielh...@gmail.com
 http://www.natehill.net



Re: [CODE4LIB] screen scraping

2011-10-02 Thread Roberto Hoyle

On 10/2/2011 10:23 PM, Nate Hill wrote:

A question: what are the 'rules' around screen scraping?
If one site doesn't offer an RSS feed and you want to grab (for example)
their weekly top ten list with a script and then redisplay it on another
site, is that bad form?  Or even illegal?


If the site in question depends on advertising, what you are suggesting 
would be seriously uncool.


If you don't get their approval and it's not for personal use, it may be 
a copyright violation also.


r.


Re: [CODE4LIB] screen scraping

2011-10-02 Thread Nate Hill
I think what I'm hearing here is that it would be a good idea to ask a
webmaster on the other end if it's OK.
Advertising... Roberto, good point I hadn't thought of that.  Thanks.

On Sun, Oct 2, 2011 at 7:46 PM, Roberto Hoyle rjho...@gmail.com wrote:

 On 10/2/2011 10:23 PM, Nate Hill wrote:

 A question: what are the 'rules' around screen scraping?
 If one site doesn't offer an RSS feed and you want to grab (for example)
 their weekly top ten list with a script and then redisplay it on another
 site, is that bad form?  Or even illegal?


 If the site in question depends on advertising, what you are suggesting
 would be seriously uncool.

 If you don't get their approval and it's not for personal use, it may be a
 copyright violation also.

 r.




-- 
Nate Hill
nathanielh...@gmail.com
http://www.natehill.net


Re: [CODE4LIB] screen scraping

2011-10-02 Thread Tracy Seneca
I don’t know how well this applies to your specific use of screen-scraping, but 
for libraries’ broader use of crawlers to build archives, the Section 108 Study 
Group Recommendations are a good source of guidance (though not law).  They 
propose specific copyright exceptions for libraries in regard to collecting and 
archiving “publicly accessible online content”.  Their recommendations are 
clear  sensible…  they run from page 80-87 of the report.

http://www.section108.gov/docs/Sec108StudyGroupReport.pdf

Tracy Seneca
California Digital Library



From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Nate Hill 
[nathanielh...@gmail.com]
Sent: Sunday, October 02, 2011 7:23 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] screen scraping

A question: what are the 'rules' around screen scraping?
If one site doesn't offer an RSS feed and you want to grab (for example)
their weekly top ten list with a script and then redisplay it on another
site, is that bad form?  Or even illegal?
Thanks-
Nate


--
Nate Hill
nathanielh...@gmail.com
http://www.natehill.net