On a second thought, it doesn't seem that the 'generate' phase checks for the modified timestamp of every page. It seems to be pre-calculated by a previous generate-fetch-update cycle.
Experienced guys can comment on how a next fetch time is calculated. From the crawldb output, it seems to have added a month to the last fetch time, though I only checked my target site's home pages. On Tue, Aug 14, 2012 at 1:26 PM, Sourajit Basak <[email protected]>wrote: > What is "adaptive fetch schedule" as dictated by the property * > db.fetch.schedule.adaptive.sync_delta* ? If this is set to true how does > property *db.fetch.interval.default* come to effect ? > > I guess the 'generate' phase checks for the modified timestamp of every > page in the crawldb. If a page does change, Nutch decides whether to > re-fetch based on the property - "* > db.fetch.schedule.adaptive.sync_delta_rate*". Is this assumption correct ? > > If yes, what does the default fetch interval mean in this context. The > re-fetch seems to be affected for such cases by how often I run "generate". >

