Hello Christian, >we've got a problem using Nutch: On the website that has to be crawled, there >is >a navigation on top of each page. Nutch crawls the navigation of each page >which leads to the situation that for certain queries (that are included in >the navigation) every page is delivered as a result.
We had always used the blacklist-whitelist plugin for this. There you can specify tags/ids and classes to white or black list in your html. http://lucene.472066.n3.nabble.com/HTML-tag-filtering-td4116686.html Here is a version compiled for nutch 1.12 with java 8. https://aarboard.oncloud7.ch/index.php/s/MfFDlsUBWMWW5ZM André

