Hi - db.fetch.retry.max only sets status DB_GONE is retry value exceeds it. It really doens't too that much. You can use a custom scheduler or set db.gone.interval.max to true. This is only for Nutch 1.8 if i remember correctly.
-----Original message----- > From:Martin Aesch <[email protected]> > Sent: Sunday 29th December 2013 17:57 > To: [email protected] > Subject: nutch retries > > Dear nutchers, > > below is an output of nutch-1.7 readdb -stats. Why is this retry count going > so high? > > In nutch-default, there is db.fetch.retry.max set to 3. I did not overwrite > this property. Anything I missed? > > Thanks, > Martin > > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: Statistics for CrawlDb: > crawl/crawldb > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: TOTAL urls: 222298055 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: retry 0: 221451536 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: retry 1: 393954 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: retry 10: 13831 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: retry 11: 13833 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: retry 12: 13615 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: retry 13: 13981 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: retry 14: 13649 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: retry 15: 14691 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: retry 16: 14549 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: retry 17: 32747 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: retry 18: 6356 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: retry 2: 111174 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: retry 3: 62275 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: retry 4: 46550 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: retry 5: 35149 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: retry 6: 17968 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: retry 7: 15727 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: retry 8: 13339 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: retry 9: 13131 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: min score: 0.0 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: avg score: 0.037087735 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: max score: 587.999 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: status 1 (db_unfetched): > 158627810 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: status 2 (db_fetched): > 58450261 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: status 3 (db_gone): 2755726 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: status 4 (db_redir_temp): 889557 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: status 5 (db_redir_perm): > 1574578 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: status 6 (db_notmodified): 123 > 13/12/29 15:24:35 INFO crawl.CrawlDbReader: CrawlDb statistics: done > > >

