We got the problem. Looking into the code of `AdaptiveFetchSchedule`, a 
`defaultInterval` will be used for the first time for each record, which is 
evaluated from configuration parameter "db.fetch.interval.default". It was not 
set in our configuration, and `AbstractFetchSchedule` implementation takes 0, 
which forced a re-fetch in every consecutive fetch phase. Sneaky. :-)

To avoid banal issues like this, default values in-code should be the same to 
the defaults of "nutch-site.xml".
Otherwise you never know what will happen.

Cheers,
Zoltán

On 2017-11-18 15:48:06, Zoltán Zvara <[email protected]> wrote:
Hi Sebastian,

We tried it but sites still get fetched every 1-2 hours, which is roughly one 
iteration.

Any other ideas? Maybe on how to debug it?

Thanks,
Zoltán
On 2017-11-12 15:34:45, Sebastian Nagel <[email protected]> wrote:
Hi Zoltán,

it's probably a bug (NUTCH-1564), try to set sync_delta to false.

Best,
Sebastian

On 11/10/2017 04:12 PM, Zoltán Zvara wrote:
> Dear Community,
>
> db.fetch.schedule.adaptive.min_interval is not respected by Nutch 1.13. It is 
> set to "86400", but a specific index of a site is fetched every 1-2 hours. 
> What could be the problem?
>
> Other configurations are:
> db.fetch.schedule.class = "org.apache.nutch.crawl.AdaptiveFetchSchedule"
> db.fetch.schedule.adaptive.min_interval = "86400"
> db.fetch.schedule.adaptive.inc_rate = "0.4"
> db.fetch.schedule.adaptive.dec_rate = "0.2"
> db.fetch.schedule.adaptive.sync_delta = "true"
> db.fetch.schedule.adaptive.sync_delta_rate = "0.3"
>
> On generate the top is: 50000, number-of-lists: 50, number-of-segments: 1
>
> Thanks,
> Zoltán
>

Reply via email to