thank you for the hint. I was studying further this and found the following info:
http://mail-archives.apache.org/mod_mbox/nutch-user/201102.mbox/%[email protected]%3E Can somebody tell me in which file exactly I have to add the filter CrawlDatum datum, Inlinks inlinks) throws IndexingException { String content = parse.getText(); System.out.println("Content : "+content); System.out.println("Contains : "+content.contains("nutch")); if(content.contains("nutch")){ System.out.println("Nutch keyword found! Hence not indexing the doc :)"); return null; } return doc; } I am simply looking to exclude documents containing the word "nutch" as example. I also have read http://www.attuneinfocom.com/blog/how-to-build-and-deploy-plugin-with-apache-nutch.html and http://florianhartl.com/nutch-plugin-tutorial.html Thank you! Domi

