Re: Extremely long fetch time

Bai Shen Mon, 03 Jun 2013 08:32:24 -0700

The time on the machine is set correctly.

It's an internal website.  Is there anything in particular in the crawl
datum you're looking for?


It's not a freshly injected url.  And it seems like all of my urls have the
long fetch times.  And it seems odd that the max interval wouldn't cause a
fetch.

I was able to do generate -adddays 6000 and run a fetch.  I'm waiting for
it to finish so I can check if this created long fetch times as well.


On Mon, Jun 3, 2013 at 10:35 AM, Tejas Patil <[email protected]>wrote:

> On Mon, Jun 3, 2013 at 6:53 AM, feng lu <[email protected]> wrote:
>
> > I see that nutch2.x will use the underlying operating system time to set
> > the FetchTime. like this
> >
> > fit.page.setFetchTime(System.currentTimeMillis());
> >
> > The granularity of the value depends on the underlying operating system.
> so
> > check your current OS time using date command.
> >
> >
> > On Mon, Jun 3, 2013 at 8:57 PM, Bai Shen <[email protected]>
> wrote:
> >
> > > I'm using the 2.x head and even with adding 30 days I'm not getting any
> > > refetches.  I did a readdb on my injected url and it says that the
> fetch
> > > time is in 2027.
> >
>
> Can share the crawl datum for that url ?
>
>
> > >
> > > Any idea why this would occur?
> >
>
> If it was a freshly injected url, then I would go with Fengs' advice.
>
> Will db.fetch.interval.max kick in and
> > > cause it to be fetched earlier?
> >
>
> nope.
>
> Or will I have to manually change the
> > > fetchTime using the hbase shell?
> >
>
> I think so.
>
> >
> > > Thanks.
> > >
> >
> >
> >
> > --
> > Don't Grow Old, Grow Up... :-)
> >
>

Re: Extremely long fetch time

Reply via email to