Hi,

I'd say 5K is big enough.  You really will "waste" some time by having such 
small fetchlists, but you'll have to see for yourself if you can live with 
that.  I think running multiple smaller fetchlists will also chew more of your 
CPU because of the more frequent JVM launches and such.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


----- Original Message ----
> From: Chris Anderson <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Wednesday, June 11, 2008 5:53:38 PM
> Subject: Re: Fast indexing?
> 
> On Wed, Jun 11, 2008 at 10:11 AM,  wrote:
> > That's not realistic with Nutch, which was really designed for larger and 
> longer "fetch jobs" (more URLs).
> >
> 
> On the subject - is there a good rule of thumb for the smallest fetch
> jobs that would make sense to run with Nutch? We're running some
> bigger crawls, but also have a standing list of blog feeds (about
> 5000) that we plan to have Nutch refetch frequently.
> 
> Thanks!
> 
> -- 
> Chris Anderson
> http://jchris.mfdz.com

Reply via email to