Hello fellow Nutchers,

I'm now trying out a "real" crawl, versus the test crawl that I mentioned in my previous email.

One thing I notice is that my slaves aren't working very hard - I'm obviously not using the appropriate whips :)

The two slaves are quad processor Xeon 2.8 & 3.0GHz CPUs. The load as reported by Ganglia is typically about 0.5 (out of 4.0), though occasionally this spikes to 1.0.

The master (also a quad 3.0GHz) is even more of a slacker, occasionally spiking to 1.0 out of 4.0, but most of the time doing nothing as it waits for the slaves to complete their jobs.

I figured as much for the master, but what can I do to get more from my slaves?

Right now I'm using the default settings from the 1/12/2006 build of Nutch. Interesting ones are:

 * mapred.tasktracker.tasks.maximum = 2
 * fetcher.threads.fetch = 10

Plus some settings gleaned (I think) from Doug's example:

 * mapred.map.tasks = 1000
 * mapred.reduce.tasks = 39
 * mapred.child.heap.size = 500m

I assume that mapred.reduce.tasks should be 3, not 39, since I've only got 2 slaves, right?

Should I be boosting mapred.tasktracker.tasks.maximum to 4?

Any other ideas? I'm trying to prepare for another run once this one has had a chance to generate some interesting results.

Thanks,

-- Ken
--
Ken Krugler

Reply via email to