Hello,

I am testing crawling (with bin/crawl command) on www.webhostingtalk.pl

And it looks that crawler fetches many disallowed urls

for example (there are many more):
robots.txt  Disallow: /index.php?showuser

but it fetched and indexed:
http://www.webhostingtalk.pl/index.php?showuser=6470

I didn't see any issues regarding this problem.

I am using trunk from last week .


I think nutch should obey all robots.txt rules in 1.0

Regards,
Bartosz


Reply via email to