Hi,

could you open a Jira issue for this?
Please, add as much as possible information and/or examples:
- after how many cycles it happened?
- fetch time and fetch interval after each cycle (or crawler run)
You could dump fetch scheduling related information via
readdb/WebTableReader.

Thanks,
Sebastian


2013/8/8 Bai Shen <[email protected]>

> Yes, I started from scratch again.  I deleted my HBase instance and
> reinjected my seed.
>
> It's definitely a bug.  My problem is that I've scoured the scheduler code
> and can't find where it's going wrong.  It works correctly for the first
> bunch of cycles where I check the fetch times.  Then once I leave it to
> crawl overnight, I return to find the far future fetch times.
>
>
> On Wed, Aug 7, 2013 at 3:59 PM, Sebastian Nagel
> <[email protected]>wrote:
>
> > Hi,
> >
> > looks like a bug. There are a couple of issues open related to
> > fetch scheduling. Afaik, all have been observed with 1.x,
> > but 2.x should be also affected. However, the problems are
> > the opposite: too short re-fetch intervals.
> >
> > I'll keep this on the radar for NUTCH-1502.
> >
> > > Originally I had db.fetch.schedule.class set to
> > > org.apache.nutch.crawl.AdaptiveFetchSchedule.  However, I changed it
> back
> > > to the default as I thought it was the problem.  However, the behavior
> > > occurs with both it and the default scheduler.
> >
> > Did you then start from scratch again? Otherwise the next fetch time
> > is still far in the future and the fetch interval keeps too large.
> >
> > Sebastian
> >
> > On 08/07/2013 03:30 PM, Bai Shen wrote:
> > > Sorry for the delayed reply.  I somehow missed it when it originally
> came
> > > in.
> > >
> > > db.fetch.schedule.class is unchanged
> > > db.fetch.interval.default is 86400
> > > db.fetch.interval.max is 604800
> > > db.fetch.schedule.adaptive.min_interval is 3600
> > > db.fetch.schedule.adaptive.max_interval is unchanged
> > > db.fetch.schedule.adaptive.sync_delta is unchanged
> > >
> > > Originally I had db.fetch.schedule.class set to
> > > org.apache.nutch.crawl.AdaptiveFetchSchedule.  However, I changed it
> back
> > > to the default as I thought it was the problem.  However, the behavior
> > > occurs with both it and the default scheduler.
> > >
> > >
> > > On Wed, Jul 17, 2013 at 2:57 PM, Sebastian Nagel <
> > [email protected]
> > >> wrote:
> > >
> > >> Hi,
> > >>
> > >> can you send values of the following properties (esp. if they differ
> > from
> > >> default):
> > >>   db.fetch.schedule.class
> > >>   db.fetch.interval.default
> > >>   db.fetch.interval.max
> > >>   db.fetch.schedule.adaptive.min_interval
> > >>   db.fetch.schedule.adaptive.max_interval
> > >>   db.fetch.schedule.adaptive.sync_delta
> > >>
> > >> Sebastian
> > >>
> > >> On 07/17/2013 06:58 PM, Bai Shen wrote:
> > >>> I'm using Nutch 2.x HEAD with the default scheduler.  I have the max
> > >> fetch
> > >>> interval set to one week and the fetch interval set to one day.
> > >>>
> > >>> Everything seems to work correctly for a while.  Pages show up as
> > fetched
> > >>> with a fetch time of the next day.  However, after a couple of days
> > >>> generate produces no urls to fetch.  Looking at the url db stats
> shows
> > >> that
> > >>> the fetch time is set months in the future.
> > >>>
> > >>> I've dug through the fetcher and scheduler code and can't see
> anything
> > >> that
> > >>> would be causing this.  Any suggestions as to what to look at next or
> > >>> things to try?
> > >>>
> > >>> Thanks.
> > >>>
> > >>
> > >>
> > >
> >
> >
>

Reply via email to