Hello,

I am crawling my website with Nutch 2.3.1 and somehow Nutch does not respect 
the robots.txt Disallow from my website. I have the following very simple 
robots.txt file:

User-agent: *
Disallow: /wpblog/feed/

Still the /wpblog/feed/ URL gets parsed and finally indexed.

Do I need to enable anything special in the nutch-site.xml config file maybe?

Thanks,
Mabi
​

​

Reply via email to