Hi,

I am using nutch 0.8 and map-red over 3 machines, and I am getting so many
unfetched urls. The unfectched URLs are supposed to be fetched on the next
round of crawling. but they are not. I started from 80000 urls and after 3
cycles here is the statistics:


060117 115357 Statistics for CrawlDb: h3/crawldb
060117 115357 TOTAL urls:       1146568
060117 115357 avg score:        1.062
060117 115357 max score:        289.219
060117 115357 min score:        1.0
060117 115357 retry 0:  1143658
060117 115357 retry 1:  2763
060117 115357 retry 2:  147
060117 115357 status 1 (DB_unfetched):  1025906
060117 115357 status 2 (DB_fetched):    117705
060117 115357 status 3 (DB_gone):       2957
060117 115357 CrawlDb statistics: done


and after the fourth depth I have:


060117 114219 Statistics for CrawlDb: h4/crawldb
060117 114219 TOTAL urls:       2194746
060117 114219 avg score:        1.074
060117 114219 max score:        747.629
060117 114219 min score:        1.0
060117 114219 retry 0:  2185706
060117 114219 retry 1:  7999
060117 114219 retry 2:  910
060117 114219 retry 3:  131
060117 114219 status 1 (DB_unfetched):  1916193
060117 114219 status 2 (DB_fetched):    271924
060117 114219 status 3 (DB_gone):       6629
060117 114219 CrawlDb statistics: done



Does any one have any idea why these URLs are hanging and not getting
fetched?

Thanks, Mike

Reply via email to