Hi all,

I am running KMeans algorithm from mahout-0.4 on a Hadoop (0.20.2) cluster.

Each node in my cluster has a Quad-core processor, hence I wished to launch
3 map and 3 reduce tasks on each node (1 core left for data-node and
tasktracker services).
Hence I set the properties :
mapred.tasktracker.map.tasks.maximum
& mapred.tasktracker.reduce.tasks.maximum to 3
and
mapred.map.tasks and mapred.reduce.tasks to 3*(no of nodes)

I tested running it on a 2 node and 6 node cluster, but in both cases only
total 5 map tasks & total 2 reducers are launched, which in case of 2 node
cluster utilizes ~3 cores on each node but it leads to underutilization of
resources in case of a 6 node cluster, where only ~1 core of each node is
used.

Please explain this behavior of these fixed no of map-reduce (5,2) tasks
being launched in both the cases.
I am guessing it to depends upon the input data for KMeans algorithm to
select the optimum number of map-red tasks (sorry, i did not test with
different input data). In that case, how to properly utilize the 6-node
cluster.


Regards
Lokendra

Reply via email to