Hello,

I have some questions related to the nutch statistics.
I ran five crawls with topN=12500, depth=2,4,7,10,11, with following results:
https://spreadsheets.google.com/ccc?key=0AvF8Ig446DzEdGNxaDNLLTgtUzdoTVNzQTJIcVFSZXc&hl=es#gid=0


Why is the number of TOTAL URLs not equal to (db_fetched + db_unfetched + 
db_gone) ?

I expected to get a value about 125000 TOTAL URLs (using TopN=12500, depth=10), 
but I got only 34000 URLs (27% of TOTAL URLs). Has this difference to do with 
the regex-urlfilters only?

When db_gone decreases (for example comparing crawl2 with crawl3) means that 
some URLs which were not available in the past will be now fetched?

Thanks for your help!

Regards
Patricio


Reply via email to