I'm looking at my base url(the root of the internal site that I first
injected to start the crawl).  It shows a status of 2 (status_fetched).
The fetch time shows sometime in 2032.
The prev fetch time is May 20, 2013.
The fetch interval is the default 30 days.
0 retries since fetch.
Modified time is in 2026.
Prev modified time is 0.
Protocol status is SUCCESS.
Parse status is success/ok

How do I convert the metadata field to readable text?


On Mon, Jun 3, 2013 at 11:39 AM, Tejas Patil <[email protected]>wrote:

> "It's not a freshly injected url."
> I am smelling that those urls were attempted to be fetched but that failed
> and so their retry interval was incremented to a larger value. Can't say
> for sure though.
> Can you share the crawl datum ? The status and meta fields can give some
> clue.
>
> On Mon, Jun 3, 2013 at 8:30 AM, Bai Shen <[email protected]> wrote:
>
> > The time on the machine is set correctly.
> >
> > It's an internal website.  Is there anything in particular in the crawl
> > datum you're looking for?
> >
> > It's not a freshly injected url.  And it seems like all of my urls have
> the
> > long fetch times.  And it seems odd that the max interval wouldn't cause
> a
> > fetch.
> >
> > I was able to do generate -adddays 6000 and run a fetch.  I'm waiting for
> > it to finish so I can check if this created long fetch times as well.
> >
> >
> > On Mon, Jun 3, 2013 at 10:35 AM, Tejas Patil <[email protected]
> > >wrote:
> >
> > > On Mon, Jun 3, 2013 at 6:53 AM, feng lu <[email protected]> wrote:
> > >
> > > > I see that nutch2.x will use the underlying operating system time to
> > set
> > > > the FetchTime. like this
> > > >
> > > > fit.page.setFetchTime(System.currentTimeMillis());
> > > >
> > > > The granularity of the value depends on the underlying operating
> > system.
> > > so
> > > > check your current OS time using date command.
> > > >
> > > >
> > > > On Mon, Jun 3, 2013 at 8:57 PM, Bai Shen <[email protected]>
> > > wrote:
> > > >
> > > > > I'm using the 2.x head and even with adding 30 days I'm not getting
> > any
> > > > > refetches.  I did a readdb on my injected url and it says that the
> > > fetch
> > > > > time is in 2027.
> > > >
> > >
> > > Can share the crawl datum for that url ?
> > >
> > >
> > > > >
> > > > > Any idea why this would occur?
> > > >
> > >
> > > If it was a freshly injected url, then I would go with Fengs' advice.
> > >
> > > Will db.fetch.interval.max kick in and
> > > > > cause it to be fetched earlier?
> > > >
> > >
> > > nope.
> > >
> > > Or will I have to manually change the
> > > > > fetchTime using the hbase shell?
> > > >
> > >
> > > I think so.
> > >
> > > >
> > > > > Thanks.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Don't Grow Old, Grow Up... :-)
> > > >
> > >
> >
>

Reply via email to