some time ago i have had the same with nutch 1.0 and i have discovered
one bug.

https://issues.apache.org/jira/browse/NUTCH-774
https://issues.apache.org/jira/browse/NUTCH-773

you will find patches there.

Sunnyvale Fl schrieb:
> You know you are right.  I dump db for another url and the retry interval is
> 0.0.  For the same crawl, some url's retry interval is 7.0.  Why is that?  I
> have db.default.fetch.interval set to 7.0 in nutch-site.xml.  Thanks!
>
> Version: 5
> Status: 2 (db_fetched)
> Fetch time: Thu Jan 21 08:55:24 PST 2010
> Modified time: Wed Dec 31 16:00:00 PST 1969
> Retries since fetch: 0
> Retry interval: 0.0 days
> Score: 0.0
> Signature: 09854146546e5e7fe5def1e1add23037
> Metadata: _pst_:success(1), lastModified=0
>
>
> On Thu, Jan 21, 2010 at 5:50 PM, reinhard schwab 
> <reinhard.sch...@aon.at>wrote:
>
>   
>> yes, i mean that.
>> in the java classes, it is called fetch interval, see CrawlDatum class.
>> do you use the adddays option when generating the segment?
>> if the value is higher than the fetch interval, then it can also happen
>> that you
>> crawl again and again a page.
>>
>> the fetch time in your entry is Nov 06 2009.
>> the last time it has been fetched is before this date.
>> it has not been refetched since that time.
>>
>>
>> Sunnyvale Fl schrieb:
>>     
>>> You mean the retry interval?  It is 7 days from readdb -
>>>
>>> Version: 5
>>> Status: 2 (db_fetched)
>>> Fetch time: Fri Nov 06 07:48:54 PST 2009
>>> Modified time: Wed Dec 31 16:00:00 PST 1969
>>> Retries since fetch: 0
>>> Retry interval: 7.0 days
>>> Score: 0.0
>>> Signature: 5ec8dc313a9ae4d61c6e8c9d9c18ea26
>>> Metadata: _pst_:success(1), lastModified=0
>>>
>>>
>>> On Thu, Jan 21, 2010 at 5:00 PM, reinhard schwab <reinhard.sch...@aon.at
>>> wrote:
>>>
>>>
>>>       
>>>> using "nutch readdb" you can dump the entry of the page.
>>>> i believe that the fetch interval of this page is zero.
>>>>
>>>> Sunnyvale Fl schrieb:
>>>>
>>>>         
>>>>> Hi,
>>>>> I am using Nutch 0.9.1 and I am having this weird problem - it will
>>>>> repeatedly fetch the same page without error.  So if I let it run to 10
>>>>> levels deep, the same page will be fetched 10 times.  What's wrong?
>>>>>
>>>>>           
>>>>  Thanks!
>>>>
>>>>         
>>>       
>>     
>
>   

Reply via email to