On Wed, 16 May 2007 16:42:05 -0400, bbrown wrote
> This is kind of a generic question. Are there any stats on how many 
> pages will get crawled based on some initial seed.  For example, if 
> you seed the list from dmoz, how many pages will get indexed?  Lets 
> say there are 4 million, will 4 million only get indexed?
> 
> Or lets say I have 4000, will I get 30,000 crawled/indexed pages?
> 
> --
> Berlin Brown
> [berlin dot brown at gmail dot com]
> http://botspiritcompany.com/botlist/?

I am sorry, lets say I give an average depth of 3.  I am asking because I 
have these article pages (blogs, news articles) about 8000 of them and I want 
to have nutch crawl them on a regular basis but would like to have an idea of 
how many pages will get created in the index.

--
Berlin Brown
[berlin dot brown at gmail dot com]
http://botspiritcompany.com/botlist/?


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to