some time ago i have had the same with nutch 1.0 and i have discovered one bug.
https://issues.apache.org/jira/browse/NUTCH-774 https://issues.apache.org/jira/browse/NUTCH-773 you will find patches there. Sunnyvale Fl schrieb: > You know you are right. I dump db for another url and the retry interval is > 0.0. For the same crawl, some url's retry interval is 7.0. Why is that? I > have db.default.fetch.interval set to 7.0 in nutch-site.xml. Thanks! > > Version: 5 > Status: 2 (db_fetched) > Fetch time: Thu Jan 21 08:55:24 PST 2010 > Modified time: Wed Dec 31 16:00:00 PST 1969 > Retries since fetch: 0 > Retry interval: 0.0 days > Score: 0.0 > Signature: 09854146546e5e7fe5def1e1add23037 > Metadata: _pst_:success(1), lastModified=0 > > > On Thu, Jan 21, 2010 at 5:50 PM, reinhard schwab > <reinhard.sch...@aon.at>wrote: > > >> yes, i mean that. >> in the java classes, it is called fetch interval, see CrawlDatum class. >> do you use the adddays option when generating the segment? >> if the value is higher than the fetch interval, then it can also happen >> that you >> crawl again and again a page. >> >> the fetch time in your entry is Nov 06 2009. >> the last time it has been fetched is before this date. >> it has not been refetched since that time. >> >> >> Sunnyvale Fl schrieb: >> >>> You mean the retry interval? It is 7 days from readdb - >>> >>> Version: 5 >>> Status: 2 (db_fetched) >>> Fetch time: Fri Nov 06 07:48:54 PST 2009 >>> Modified time: Wed Dec 31 16:00:00 PST 1969 >>> Retries since fetch: 0 >>> Retry interval: 7.0 days >>> Score: 0.0 >>> Signature: 5ec8dc313a9ae4d61c6e8c9d9c18ea26 >>> Metadata: _pst_:success(1), lastModified=0 >>> >>> >>> On Thu, Jan 21, 2010 at 5:00 PM, reinhard schwab <reinhard.sch...@aon.at >>> wrote: >>> >>> >>> >>>> using "nutch readdb" you can dump the entry of the page. >>>> i believe that the fetch interval of this page is zero. >>>> >>>> Sunnyvale Fl schrieb: >>>> >>>> >>>>> Hi, >>>>> I am using Nutch 0.9.1 and I am having this weird problem - it will >>>>> repeatedly fetch the same page without error. So if I let it run to 10 >>>>> levels deep, the same page will be fetched 10 times. What's wrong? >>>>> >>>>> >>>> Thanks! >>>> >>>> >>> >> > >