Hi, I am using nutch 0.8 and map-red over 3 machines, and I am getting so many unfetched urls. The unfectched URLs are supposed to be fetched on the next round of crawling. but they are not. I started from 80000 urls and after 3 cycles here is the statistics:
060117 115357 Statistics for CrawlDb: h3/crawldb 060117 115357 TOTAL urls: 1146568 060117 115357 avg score: 1.062 060117 115357 max score: 289.219 060117 115357 min score: 1.0 060117 115357 retry 0: 1143658 060117 115357 retry 1: 2763 060117 115357 retry 2: 147 060117 115357 status 1 (DB_unfetched): 1025906 060117 115357 status 2 (DB_fetched): 117705 060117 115357 status 3 (DB_gone): 2957 060117 115357 CrawlDb statistics: done and after the fourth depth I have: 060117 114219 Statistics for CrawlDb: h4/crawldb 060117 114219 TOTAL urls: 2194746 060117 114219 avg score: 1.074 060117 114219 max score: 747.629 060117 114219 min score: 1.0 060117 114219 retry 0: 2185706 060117 114219 retry 1: 7999 060117 114219 retry 2: 910 060117 114219 retry 3: 131 060117 114219 status 1 (DB_unfetched): 1916193 060117 114219 status 2 (DB_fetched): 271924 060117 114219 status 3 (DB_gone): 6629 060117 114219 CrawlDb statistics: done Does any one have any idea why these URLs are hanging and not getting fetched? Thanks, Mike