Hi, looks like a bug. There are a couple of issues open related to fetch scheduling. Afaik, all have been observed with 1.x, but 2.x should be also affected. However, the problems are the opposite: too short re-fetch intervals.
I'll keep this on the radar for NUTCH-1502. > Originally I had db.fetch.schedule.class set to > org.apache.nutch.crawl.AdaptiveFetchSchedule. However, I changed it back > to the default as I thought it was the problem. However, the behavior > occurs with both it and the default scheduler. Did you then start from scratch again? Otherwise the next fetch time is still far in the future and the fetch interval keeps too large. Sebastian On 08/07/2013 03:30 PM, Bai Shen wrote: > Sorry for the delayed reply. I somehow missed it when it originally came > in. > > db.fetch.schedule.class is unchanged > db.fetch.interval.default is 86400 > db.fetch.interval.max is 604800 > db.fetch.schedule.adaptive.min_interval is 3600 > db.fetch.schedule.adaptive.max_interval is unchanged > db.fetch.schedule.adaptive.sync_delta is unchanged > > Originally I had db.fetch.schedule.class set to > org.apache.nutch.crawl.AdaptiveFetchSchedule. However, I changed it back > to the default as I thought it was the problem. However, the behavior > occurs with both it and the default scheduler. > > > On Wed, Jul 17, 2013 at 2:57 PM, Sebastian Nagel <[email protected] >> wrote: > >> Hi, >> >> can you send values of the following properties (esp. if they differ from >> default): >> db.fetch.schedule.class >> db.fetch.interval.default >> db.fetch.interval.max >> db.fetch.schedule.adaptive.min_interval >> db.fetch.schedule.adaptive.max_interval >> db.fetch.schedule.adaptive.sync_delta >> >> Sebastian >> >> On 07/17/2013 06:58 PM, Bai Shen wrote: >>> I'm using Nutch 2.x HEAD with the default scheduler. I have the max >> fetch >>> interval set to one week and the fetch interval set to one day. >>> >>> Everything seems to work correctly for a while. Pages show up as fetched >>> with a fetch time of the next day. However, after a couple of days >>> generate produces no urls to fetch. Looking at the url db stats shows >> that >>> the fetch time is set months in the future. >>> >>> I've dug through the fetcher and scheduler code and can't see anything >> that >>> would be causing this. Any suggestions as to what to look at next or >>> things to try? >>> >>> Thanks. >>> >> >> >

