[Nutch-general] Re: Legal issues

Insurance Squared Inc. Thu, 30 Mar 2006 05:34:05 -0800

FWIW, I believe all of what's been stated is the case - and I'd alsoassume that since Google/MSN/Yahoo are all doing this that it's beentested and OK.However I know many people complain about the cache. Some people see itas a copyright violation - technically correct or not, the cache doesbasically duplicate their site and make it available online. And I'venever seen how to argue against that other than 'legally it's not'. IMOit's cutting it pretty close.The other issue some have with displaying cache is that it allows peopleto pull down websites without ever visiting the website in questions.If I put serious effort into blocking bots and scrapers for example, butlet the SE's in so I can get indexed, then the bots and scrapers cancompletely bypass my efforts, visit the SE and pull down the cachedpages there. They can then do nasty stuff with my content, like copy iton their site for their own purposes. Not good, and that's the reasonwhy I don't show the cache on my SE.

g.



Dan Morrill wrote:

If I remember it correctly, google as been sued and won a number of times on
this issue, you can cache, you can search others web sites, grocklaw has the
data on this one, but I know you can search, you can cache under fair use,
and the idea of public access, as long as you are not cracking passwords,
and honor robots.txt and they post it on the web, it is considered public in

that regard.I am not a lawyer, check grocklaw.

r/d

-----Original Message-----

From: TDLN [mailto:[EMAIL PROTECTED]Sent: Thursday, March 30, 2006 3:34 AM

To: [email protected]
Subject: Re: Legal issues

Google's and Yahoo's Terms of Service provide interesting reading regarding
such legal issues.

http://www.google.com/terms_of_service.html
http://docs.yahoo.com/info/terms/

Rgrds, Thomas

On 3/30/06, gekkokid <[EMAIL PROTECTED]> wrote:

Shouldn't be a problem if your honouring the robots.txt

Legal issues could be Stealing Copyrighted Material? thats if your
reproducing it but if your analysing the content and links and keeping to
the robots.txt rules I doubt your have a problem unless its crawling every
10 minutes,

wouldn't grabbing the RSS feed be better?

would http://diggdot.us be a good example of what your trying to do? or
have
i got the wrong idea entirely?

Any one else have any thoughts?

_gk

----- Original Message -----
From: "Berlin Brown" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Thursday, March 30, 2006 8:13 AM
Subject: Legal issues


What are say the legal issues of crawling a site like reddit, digg or
slashdot.  Assuming that you are just collecting links that users post
through that service and then you are regathering those links.  I
can't see an issue there.

The other extreme would be crawling google and requerying or something
along those lines.



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Re: Legal issues

Reply via email to