Yes, I started from scratch again. I deleted my HBase instance and reinjected my seed.
It's definitely a bug. My problem is that I've scoured the scheduler code and can't find where it's going wrong. It works correctly for the first bunch of cycles where I check the fetch times. Then once I leave it to crawl overnight, I return to find the far future fetch times. On Wed, Aug 7, 2013 at 3:59 PM, Sebastian Nagel <[email protected]>wrote: > Hi, > > looks like a bug. There are a couple of issues open related to > fetch scheduling. Afaik, all have been observed with 1.x, > but 2.x should be also affected. However, the problems are > the opposite: too short re-fetch intervals. > > I'll keep this on the radar for NUTCH-1502. > > > Originally I had db.fetch.schedule.class set to > > org.apache.nutch.crawl.AdaptiveFetchSchedule. However, I changed it back > > to the default as I thought it was the problem. However, the behavior > > occurs with both it and the default scheduler. > > Did you then start from scratch again? Otherwise the next fetch time > is still far in the future and the fetch interval keeps too large. > > Sebastian > > On 08/07/2013 03:30 PM, Bai Shen wrote: > > Sorry for the delayed reply. I somehow missed it when it originally came > > in. > > > > db.fetch.schedule.class is unchanged > > db.fetch.interval.default is 86400 > > db.fetch.interval.max is 604800 > > db.fetch.schedule.adaptive.min_interval is 3600 > > db.fetch.schedule.adaptive.max_interval is unchanged > > db.fetch.schedule.adaptive.sync_delta is unchanged > > > > Originally I had db.fetch.schedule.class set to > > org.apache.nutch.crawl.AdaptiveFetchSchedule. However, I changed it back > > to the default as I thought it was the problem. However, the behavior > > occurs with both it and the default scheduler. > > > > > > On Wed, Jul 17, 2013 at 2:57 PM, Sebastian Nagel < > [email protected] > >> wrote: > > > >> Hi, > >> > >> can you send values of the following properties (esp. if they differ > from > >> default): > >> db.fetch.schedule.class > >> db.fetch.interval.default > >> db.fetch.interval.max > >> db.fetch.schedule.adaptive.min_interval > >> db.fetch.schedule.adaptive.max_interval > >> db.fetch.schedule.adaptive.sync_delta > >> > >> Sebastian > >> > >> On 07/17/2013 06:58 PM, Bai Shen wrote: > >>> I'm using Nutch 2.x HEAD with the default scheduler. I have the max > >> fetch > >>> interval set to one week and the fetch interval set to one day. > >>> > >>> Everything seems to work correctly for a while. Pages show up as > fetched > >>> with a fetch time of the next day. However, after a couple of days > >>> generate produces no urls to fetch. Looking at the url db stats shows > >> that > >>> the fetch time is set months in the future. > >>> > >>> I've dug through the fetcher and scheduler code and can't see anything > >> that > >>> would be causing this. Any suggestions as to what to look at next or > >>> things to try? > >>> > >>> Thanks. > >>> > >> > >> > > > >

