Hi all, I am running KMeans algorithm from mahout-0.4 on a Hadoop (0.20.2) cluster.
Each node in my cluster has a Quad-core processor, hence I wished to launch 3 map and 3 reduce tasks on each node (1 core left for data-node and tasktracker services). Hence I set the properties : mapred.tasktracker.map.tasks.maximum & mapred.tasktracker.reduce.tasks.maximum to 3 and mapred.map.tasks and mapred.reduce.tasks to 3*(no of nodes) I tested running it on a 2 node and 6 node cluster, but in both cases only total 5 map tasks & total 2 reducers are launched, which in case of 2 node cluster utilizes ~3 cores on each node but it leads to underutilization of resources in case of a 6 node cluster, where only ~1 core of each node is used. Please explain this behavior of these fixed no of map-reduce (5,2) tasks being launched in both the cases. I am guessing it to depends upon the input data for KMeans algorithm to select the optimum number of map-red tasks (sorry, i did not test with different input data). In that case, how to properly utilize the 6-node cluster. Regards Lokendra
