robots.txt Disallow not respected

mabi Sun, 10 Dec 2017 14:16:54 -0800

Hello,

I am crawling my website with Nutch 2.3.1 and somehow Nutch does not respect 
the robots.txt Disallow from my website. I have the following very simple 
robots.txt file:


User-agent: *
Disallow: /wpblog/feed/

Still the /wpblog/feed/ URL gets parsed and finally indexed.

Do I need to enable anything special in the nutch-site.xml config file maybe?

Thanks,
Mabi

robots.txt Disallow not respected

Reply via email to