There is a command to show stats on your database of links. It will show what has been fetched (if any) and what is waiting to be. Keep in mind though, during a fetch if the page cannot be retrieved then it will not be indexed so only use this number as a estimate for the final indexed amount.
The command is below, it can take minutes or even hours to complete depending on the size of your database. "bin/nutch readdb [path to crawldb] -stats" ----- Original Message ---- From: bbrown <[EMAIL PROTECTED]> To: [email protected] Sent: Wednesday, May 16, 2007 4:42:05 PM Subject: Generic Question about initial seed This is kind of a generic question. Are there any stats on how many pages will get crawled based on some initial seed. For example, if you seed the list from dmoz, how many pages will get indexed? Lets say there are 4 million, will 4 million only get indexed? Or lets say I have 4000, will I get 30,000 crawled/indexed pages? -- Berlin Brown [berlin dot brown at gmail dot com] http://botspiritcompany.com/botlist/?
