Hi all! So I was trying to use the option indexer.delete.robots.noindex (exclude page when <meta robots="noindex"> is encountered).
However, the page I'm testing with is still being indexed. I have parse-metatags and index-metadata activated and indexer.delete.robots.noindex=true, metatags.names="robots" and index.parse.md="metatag.robots". Looking at IndexerMapReduce.java (#257) [1], the field that is being checked is "robots" and not "metatag.robots". It does work as expected when I change it to "metatag.robots": Before: Indexing 3/3 documents Deleting 0 documents Indexer: number of documents indexed, deleted, or skipped: Indexer: 3 indexed (add/update) After: Indexing 2/2 documents Deleting 0 documents Indexer: number of documents indexed, deleted, or skipped: Indexer: 1 deleted (robots=noindex) Indexer: 2 indexed (add/update) Am I missing something and this is not actually a bug but rather some misconfiguration on my part? Or is it and I should file a report/patch? Thanks! Felix [1] https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/indexer/IndexerMapReduce.java#L257

