The default tuning parameters are specified in nutch/conf/nutch-default.xml,
and can be overridden in nutch/conf/nutch-site.xml. (Or in the crawl command
line, but I believe that the 'best practice' is to configure settings in
nutch-site.xml.)

My personal belief is that the two most valuable parameters for tuning the
crawler are 'fetcher.threads.fetch' and 'fetcher.threads.per.host'. However,
there are lots of other parameters for tuning, and you might find more value
in some of the timeout parameters. (You might also want to look at tuning
you JVM heap space, but I've never seen a real need to tweak it.)

As far as resuming a failed crawl, I don't know of any way to do so. I
always discard and restart.

-- 
View this message in context: 
http://old.nabble.com/What-are-the-configuration-parameters-to-fine-tune-Nutch-performance-tp26125943p26250181.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to