Hello, I have some questions related to the nutch statistics. I ran five crawls with topN=12500, depth=2,4,7,10,11, with following results: https://spreadsheets.google.com/ccc?key=0AvF8Ig446DzEdGNxaDNLLTgtUzdoTVNzQTJIcVFSZXc&hl=es#gid=0
Why is the number of TOTAL URLs not equal to (db_fetched + db_unfetched + db_gone) ? I expected to get a value about 125000 TOTAL URLs (using TopN=12500, depth=10), but I got only 34000 URLs (27% of TOTAL URLs). Has this difference to do with the regex-urlfilters only? When db_gone decreases (for example comparing crawl2 with crawl3) means that some URLs which were not available in the past will be now fetched? Thanks for your help! Regards Patricio

