Is nutch obey robots.txt properly?

Bartosz Gadzimski Thu, 26 Feb 2009 02:36:41 -0800

Hello,

I am testing crawling (with bin/crawl command) on www.webhostingtalk.pl


And it looks that crawler fetches many disallowed urls

for example (there are many more):
robots.txt  Disallow: /index.php?showuser

but it fetched and indexed:
http://www.webhostingtalk.pl/index.php?showuser=6470

I didn't see any issues regarding this problem.

I am using trunk from last week .


I think nutch should obey all robots.txt rules in 1.0

Regards,
Bartosz

Is nutch obey robots.txt properly?

Reply via email to