You need to differentiate slots from tasks. Tasks are spawned by TT and assigned to a free slot in the cluster. The number of map tasks for a Hadoop job is typically controlled by the input data size and the split size. The number of reduce tasks for a Hadoop job is controlled by the *mapreduce.job.reduces* parameter. If this parameter is not set, jobs have one reduce task. The number of map and reduce slots on each TaskTracker node is controlled by the *mapreduce.tasktracker.map.tasks.maximum*and *mapreduce.tasktracker.reduce.tasks.maximum* Hadoop properties in the mapred-site.xml file. These parameters define the maximum number of concurrently occupied slots on a TaskTracker node and determine the degree of concurrency on each TaskTracker. Finally, it is important to consider the memory usage of each map and/or reduce task. Task heap sizes are usually controlled by the *mapred.child.java.opts* Hadoop parameter. If your Hadoop jobs are memory-intensive and have large JVM heaps, then reduce the number of slots. If your Hadoop jobs have small JVM heaps, you may be able to increase the number of slots. Keep in mind the maximum amount of memory that the task JVMs consume if all slots are filled.
2014-04-14 14:18 GMT-03:00 Shashidhar Rao <[email protected]>: > Hi, > > Can somebody clarify what are map and reduce slots and how Hadoop > calculates these slots. Are these slots calculated based on the number of > splits? > > I am getting different answers please help > > Regards > Shashidhar >
