We got the problem. Looking into the code of `AdaptiveFetchSchedule`, a `defaultInterval` will be used for the first time for each record, which is evaluated from configuration parameter "db.fetch.interval.default". It was not set in our configuration, and `AbstractFetchSchedule` implementation takes 0, which forced a re-fetch in every consecutive fetch phase. Sneaky. :-)
To avoid banal issues like this, default values in-code should be the same to the defaults of "nutch-site.xml". Otherwise you never know what will happen. Cheers, Zoltán On 2017-11-18 15:48:06, Zoltán Zvara <[email protected]> wrote: Hi Sebastian, We tried it but sites still get fetched every 1-2 hours, which is roughly one iteration. Any other ideas? Maybe on how to debug it? Thanks, Zoltán On 2017-11-12 15:34:45, Sebastian Nagel <[email protected]> wrote: Hi Zoltán, it's probably a bug (NUTCH-1564), try to set sync_delta to false. Best, Sebastian On 11/10/2017 04:12 PM, Zoltán Zvara wrote: > Dear Community, > > db.fetch.schedule.adaptive.min_interval is not respected by Nutch 1.13. It is > set to "86400", but a specific index of a site is fetched every 1-2 hours. > What could be the problem? > > Other configurations are: > db.fetch.schedule.class = "org.apache.nutch.crawl.AdaptiveFetchSchedule" > db.fetch.schedule.adaptive.min_interval = "86400" > db.fetch.schedule.adaptive.inc_rate = "0.4" > db.fetch.schedule.adaptive.dec_rate = "0.2" > db.fetch.schedule.adaptive.sync_delta = "true" > db.fetch.schedule.adaptive.sync_delta_rate = "0.3" > > On generate the top is: 50000, number-of-lists: 50, number-of-segments: 1 > > Thanks, > Zoltán >

