I have Nutch set-up to crawl my local filesystem and have it linked to Solr.
Everything works fine except when I recrawl using the (./nutch crawl command) and have deleted a document that was previously indexed it doesnt seem to register it as status DB_GONE. Post recrawl I run "./nutch readdb <crawldb> -stats" command and the deleted ones are marked as unfetched. The wierd thing is, if I add 404 purging to my nutch-site.xml file, it deletes the links that have been deleted, so it seems that during the crawl it may be marked as DB_GONE but at the end of the crawl it is not. If you need to know any of my configuration settings then you can check out my posts on my blog, which are in the form of set-up guides: http://amac4.blogspot.co.uk/ Thanks Allan -- View this message in context: http://lucene.472066.n3.nabble.com/Nutch-Dead-urls-not-marked-as-DB-GONE-tp4085450.html Sent from the Nutch - User mailing list archive at Nabble.com.

