Hi, I have setup for boxes using MapReduce, everything goes smoothly, I have feeded about 80000 seed nodes for begining and I have crawled by depth 2. Only 1900 pages (about 300MG) data and the rest is marked and db unfetched. Does any one know what could be wrong?
This is the output of (bin/nutch readdb h2/crawldb -stats): 060115 171625 Statistics for CrawlDb: h2/crawldb 060115 171625 TOTAL urls: 99403 060115 171625 avg score: 1.01 060115 171625 max score: 7.382 060115 171625 min score: 1.0 060115 171625 retry 0: 99403 060115 171625 status 1 (DB_unfetched): 97470 060115 171625 status 2 (DB_fetched): 1933 060115 171625 CrawlDb statistics: done Thanks, Mike
