Re: about time for recrawl a url

feng lu Fri, 06 Sep 2013 07:34:52 -0700

Hi Eyeris

use this command


bin/nutch readdb <crawldb> -url <url>

example output like this:

l$ bin/nutch readdb crawldb/ -url
http://news.163.com/05/0920/16/1U3U7N9P0001121R.html
URL: http://news.163.com/05/0920/16/1U3U7N9P0001121R.html
Version: 7
Status: 1 (db_unfetched)
Fetch time: Sun Aug 11 00:43:13 CST 2013
Modified time: Thu Jan 01 08:00:00 CST 1970
Retries since fetch: 0
Retry interval: 2592000 seconds (30 days)
Score: 4.4117645E-5
Signature: null
Metadata:

you can see the Fetch time and retry interval, and next fetch time equal to
fetch time plus retry interval.


On Fri, Sep 6, 2013 at 10:26 PM, Eyeris Rodriguez Rueda <[email protected]>wrote:

> Hi all.
> I want to know about the time for recrawl a url. any idea about the place
> where i can learn about that?
> Im using nutch 1.5.1.
>
> I know that initially the next fetch time is based on
> db.fetch.interval.default property and this time is changing for
> db.fetch.schedule.adaptive.inc_rate and db.fetch.schedule.adaptive.dec_rate
> but how i can check the next fetch time for one url ? and which commands i
> can use ?.
> Some help will be appreciated.
> Thanks.
>



-- 
Don't Grow Old, Grow Up... :-)

Re: about time for recrawl a url

Reply via email to