Nutch can detect 404's by recrawling existing URL's. The mutation, however, is not pushed to Solr at the moment.
> As far as I know, Nutch can only discover new URLs to crawl and send the > parsed content to Solr. But what about maintaining the index? Say that > you have a daily Nutch script that fetches/parses the web and updates > the Solr index. After one month, several web pages have been modified > and some have also been deleted. In other words, the Solr index is out > of sync. > > Is it possible to detect such changes in order to send update/delete > commands to Solr? > > It looks like the Aperture crawler has a workaround for this since the > crawler handler have methods such as objectChanged(...): > http://sourceforge.net/apps/trac/aperture/wiki/Crawlers > > Erlend