Hi, I'd say 5K is big enough. You really will "waste" some time by having such small fetchlists, but you'll have to see for yourself if you can live with that. I think running multiple smaller fetchlists will also chew more of your CPU because of the more frequent JVM launches and such.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Chris Anderson <[EMAIL PROTECTED]> > To: [email protected] > Sent: Wednesday, June 11, 2008 5:53:38 PM > Subject: Re: Fast indexing? > > On Wed, Jun 11, 2008 at 10:11 AM, wrote: > > That's not realistic with Nutch, which was really designed for larger and > longer "fetch jobs" (more URLs). > > > > On the subject - is there a good rule of thumb for the smallest fetch > jobs that would make sense to run with Nutch? We're running some > bigger crawls, but also have a standing list of blog feeds (about > 5000) that we plan to have Nutch refetch frequently. > > Thanks! > > -- > Chris Anderson > http://jchris.mfdz.com
