Hi Experts, In previous discussions, I found following descriptions: "mapreduce.job.ubertask.enable | (false) | 'Whether to enable the small-jobs "ubertask" optimization, which runs "sufficiently small" jobs sequentially within a single JVM. "Small" is defined by the following maxmaps, maxreduces, and maxbytes settings. Users may override this value.'"
Basing on above description, I set "mapreduce.job.ubertask.enable" to true and also configured other uber related parameters, and then I did some practices and have following understanding. 1) If I submit a bunch of small MR jobs to Hadoop cluster(each MR job will run in uber mode): - Each MR job corresponds to an application, like application_1383815949546_0006 - Each application has its own container, like container_1383815949546_0010_01_000001 - When a container launched by nodemanager, it will launch a JVM too. When the container stops, the JVM will stop as well. A container only has one JVM in its whole life cycle. - Each application_1383815949546_0006 includes some map tasks and reduce tasks - In uber mode, all the map tasks and reduce tasks of application_1383815949546_0006 will be executed in a the same and only container container_1383815949546_0010_01_000001. It also means that all map tasks and reduce tasks will be executed in a single JVM. - A container could not be shared among different applications(jobs) 2) If I submit a bunch of big MR jobs to Hadoop cluster(each MR job will run and NOT in uber mode): - Each map task and reduce task of application_1383815949546_0006 will be executed in its own container. It means that application_1383815949546_0006 will have lots of containers. I am not sure whether above undertandings are correct or not, so any comments/corrections will be appreciated!
