Hi all, I’m currently using Nutch 2.2.1 and noticed what seems to a be a bug in the update step. Everytime I run a crawl (using a modified bin/crawl script), the fetchtime is updated even for pages that were not fetched during the current crawl.
I found the related bug report NUTCH-1457 [1] through a previous post on this list [2]. For me this means that Nutch 2.2.1 is unusable. I want to run continuous crawls in order to keep a Solr index of a website up-to-date. This bug basically ensures that most pages will never be fetched again as their fetchtime is increased on each updatedb. Is there a workaround? Does this problem appear in Nutch 1.7? Cheers, Günter [1] https://issues.apache.org/jira/browse/NUTCH-1457 [2] http://lucene.472066.n3.nabble.com/updatedb-in-nutch-2-0-increases-fetch-time-of-all-pages-td4008429.html

