Shouldn't be a problem if your honouring the robots.txt
Legal issues could be Stealing Copyrighted Material? thats if your
reproducing it but if your analysing the content and links and keeping to
the robots.txt rules I doubt your have a problem unless its crawling every
10 minutes,
wouldn't grabbing the RSS feed be better?
would http://diggdot.us be a good example of what your trying to do? or have
i got the wrong idea entirely?
Any one else have any thoughts?
_gk
----- Original Message -----
From: "Berlin Brown" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Thursday, March 30, 2006 8:13 AM
Subject: Legal issues
What are say the legal issues of crawling a site like reddit, digg or
slashdot. Assuming that you are just collecting links that users post
through that service and then you are regathering those links. I
can't see an issue there.
The other extreme would be crawling google and requerying or something
along those lines.
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general