Hi,

I have setup for boxes using MapReduce, everything goes smoothly, I have
feeded about 80000 seed nodes for begining and I have crawled by depth 2.
Only 1900 pages (about 300MG) data and the rest is marked and db unfetched.
Does any one know what could be wrong?

This is the output of (bin/nutch readdb h2/crawldb -stats):

060115 171625 Statistics for CrawlDb: h2/crawldb
060115 171625 TOTAL urls:       99403
060115 171625 avg score:        1.01
060115 171625 max score:        7.382
060115 171625 min score:        1.0
060115 171625 retry 0:  99403
060115 171625 status 1 (DB_unfetched):  97470
060115 171625 status 2 (DB_fetched):    1933
060115 171625 CrawlDb statistics: done

Thanks,
Mike

Reply via email to