Yes, I started from scratch again.  I deleted my HBase instance and
reinjected my seed.

It's definitely a bug.  My problem is that I've scoured the scheduler code
and can't find where it's going wrong.  It works correctly for the first
bunch of cycles where I check the fetch times.  Then once I leave it to
crawl overnight, I return to find the far future fetch times.


On Wed, Aug 7, 2013 at 3:59 PM, Sebastian Nagel
<[email protected]>wrote:

> Hi,
>
> looks like a bug. There are a couple of issues open related to
> fetch scheduling. Afaik, all have been observed with 1.x,
> but 2.x should be also affected. However, the problems are
> the opposite: too short re-fetch intervals.
>
> I'll keep this on the radar for NUTCH-1502.
>
> > Originally I had db.fetch.schedule.class set to
> > org.apache.nutch.crawl.AdaptiveFetchSchedule.  However, I changed it back
> > to the default as I thought it was the problem.  However, the behavior
> > occurs with both it and the default scheduler.
>
> Did you then start from scratch again? Otherwise the next fetch time
> is still far in the future and the fetch interval keeps too large.
>
> Sebastian
>
> On 08/07/2013 03:30 PM, Bai Shen wrote:
> > Sorry for the delayed reply.  I somehow missed it when it originally came
> > in.
> >
> > db.fetch.schedule.class is unchanged
> > db.fetch.interval.default is 86400
> > db.fetch.interval.max is 604800
> > db.fetch.schedule.adaptive.min_interval is 3600
> > db.fetch.schedule.adaptive.max_interval is unchanged
> > db.fetch.schedule.adaptive.sync_delta is unchanged
> >
> > Originally I had db.fetch.schedule.class set to
> > org.apache.nutch.crawl.AdaptiveFetchSchedule.  However, I changed it back
> > to the default as I thought it was the problem.  However, the behavior
> > occurs with both it and the default scheduler.
> >
> >
> > On Wed, Jul 17, 2013 at 2:57 PM, Sebastian Nagel <
> [email protected]
> >> wrote:
> >
> >> Hi,
> >>
> >> can you send values of the following properties (esp. if they differ
> from
> >> default):
> >>   db.fetch.schedule.class
> >>   db.fetch.interval.default
> >>   db.fetch.interval.max
> >>   db.fetch.schedule.adaptive.min_interval
> >>   db.fetch.schedule.adaptive.max_interval
> >>   db.fetch.schedule.adaptive.sync_delta
> >>
> >> Sebastian
> >>
> >> On 07/17/2013 06:58 PM, Bai Shen wrote:
> >>> I'm using Nutch 2.x HEAD with the default scheduler.  I have the max
> >> fetch
> >>> interval set to one week and the fetch interval set to one day.
> >>>
> >>> Everything seems to work correctly for a while.  Pages show up as
> fetched
> >>> with a fetch time of the next day.  However, after a couple of days
> >>> generate produces no urls to fetch.  Looking at the url db stats shows
> >> that
> >>> the fetch time is set months in the future.
> >>>
> >>> I've dug through the fetcher and scheduler code and can't see anything
> >> that
> >>> would be causing this.  Any suggestions as to what to look at next or
> >>> things to try?
> >>>
> >>> Thanks.
> >>>
> >>
> >>
> >
>
>

Reply via email to