Re: Question on scalability

Doug Cutting Wed, 15 Mar 2006 11:40:52 -0800

Olive g wrote:

Is hadoop/nutch scalable at all or I can tune some other parameters?

I'm not sure what you're asking. How long does it take to run this on asingle machine? My guess is that it's much longer. So things arescaling: they're running faster when more hardware is added. In allcases you're using the same number of machines, but varying parametersand seeing different performance, as one would expect. For your currentconfiguration, indexing appears fastest when the number of reduce tasksequals the number of nodes.

I already have:
mapred.map.tasks set to 100
mapred.job.tracker is not local
mapred.tasktracker.tasks.maximum is 2.
and everything else is default.


How are you storing things?  Are you using dfs?

Are your nodes single-cpu or dual-cpu? My guess is single-cpu, in whichcase you might see more consistent performance withmapred.tasktracker.tasks.maximum=1.

How many disks do you have per node? If you have multiple drives, thenconfiguring mapred.local.dir to contain a list of directories, one perdrive, might make things faster.


Doug

Re: Question on scalability

Reply via email to