Hi, The fetchtime increasing is a bug indeed. There is already an issue for it: https://issues.apache.org/jira/browse/NUTCH-1457
About removing urls, I'm not sure what the best solution is. It is difficult to handle changes to normalizing/filtering rules over time. For know it is best to not change rules in an existing crawl, otherwise you have to run a custom delete tool or something like that. Ferdy. On Mon, Sep 17, 2012 at 8:57 PM, <[email protected]> wrote: > Hello, > > updatedb in nutch-2.0 increases fetch time of all pages independent of if > they have already been fetched or not. > For example if updatedb is applied in depth 1 and page A is fetched and > its fetchTime is 30 days from now, then as a result of running updatedb in > depth 2 fetch time of page A will be 60 days from now and so on. > > Also, I wondered if it is possible to remove pages that do not pass > filters from hbase datastore by using updatedb?. > > Thanks. > Alex. >

