On a second thought, it doesn't seem that the 'generate' phase checks for
the modified timestamp of every page. It seems to be pre-calculated by a
previous generate-fetch-update cycle.

Experienced guys can comment on how a next fetch time is calculated. From
the crawldb output, it seems to have added a month to the last fetch time,
though I only checked my target site's home pages.

On Tue, Aug 14, 2012 at 1:26 PM, Sourajit Basak <[email protected]>wrote:

> What is "adaptive fetch schedule" as dictated by the property *
> db.fetch.schedule.adaptive.sync_delta* ? If this is set to true how does
> property *db.fetch.interval.default* come to effect ?
>
> I guess the 'generate' phase checks for the modified timestamp of every
> page in the crawldb. If a page does change, Nutch decides whether to
> re-fetch based on the property - "*
> db.fetch.schedule.adaptive.sync_delta_rate*". Is this assumption correct ?
>
> If yes, what does the default fetch interval mean in this context. The
> re-fetch seems to be affected for such cases by how often I run "generate".
>

Reply via email to