Hello, I am crawling my website with Nutch 2.3.1 and somehow Nutch does not respect the robots.txt Disallow from my website. I have the following very simple robots.txt file:
User-agent: * Disallow: /wpblog/feed/ Still the /wpblog/feed/ URL gets parsed and finally indexed. Do I need to enable anything special in the nutch-site.xml config file maybe? Thanks, Mabi

