Hey, I'm a bit of a mahout / hadoop newbie myself, but from what I know, the number of map tasks is determined solely bu the input. You can give it a hint via mapred.map.tasks, but its only a hint. To change the number of map tasks, you need to change dfs.block.size and mapred.max.split.size from the default of 64M to something smaller (but a multiple of 512).
So it seems that 64M generated only 5 map tasks, when you want a total of 18 (3 map tasks on 6 machines). A block size of almost 1/4, around 17M, would get you 18 map tasks ( -Ddfs.block.size=17825792 -Dmapred.max.split.size=17825792 ). I don't know if this is generally advised by Mahout users, but it should help. The number of reducers can be set explicitly to 18: -Dmapred.reduce.tasks=18. However, you did set mapred.reduce.tasks to 3*(no of nodes) ... are you sure that value is in all the node's conf files? -- james On Wed, Jan 19, 2011 at 12:49 PM, Lokendra Singh <[email protected]>wrote: > Hi all, > > I am running KMeans algorithm from mahout-0.4 on a Hadoop (0.20.2) cluster. > > Each node in my cluster has a Quad-core processor, hence I wished to launch > 3 map and 3 reduce tasks on each node (1 core left for data-node and > tasktracker services). > Hence I set the properties : > mapred.tasktracker.map.tasks.maximum > & mapred.tasktracker.reduce.tasks.maximum to 3 > and > mapred.map.tasks and mapred.reduce.tasks to 3*(no of nodes) > > I tested running it on a 2 node and 6 node cluster, but in both cases only > total 5 map tasks & total 2 reducers are launched, which in case of 2 node > cluster utilizes ~3 cores on each node but it leads to underutilization of > resources in case of a 6 node cluster, where only ~1 core of each node is > used. > > Please explain this behavior of these fixed no of map-reduce (5,2) tasks > being launched in both the cases. > I am guessing it to depends upon the input data for KMeans algorithm to > select the optimum number of map-red tasks (sorry, i did not test with > different input data). In that case, how to properly utilize the 6-node > cluster. > > > Regards > Lokendra >
