I am using nutch 1.0 and after every updatedb, I take the stats with the sort parameter which gives the details statistics regarding the domains and their count(number of urls for that domain in crawldb). But I see that there is a variable number of domains that do not make into the next round of statistics.
Example: Suppose a domain will be in 4 rounds of crawling (by looking at readdb stats -sort usage) but it will disappear from the next rounds. Or some domain will be there for first two rounds but will disappear from stats for the next few rounds and then reappear again. Is it possible that the domains may be removed from the crawldb or/and then added later? Regards Gaurav

