Your NodeManager can use 2048 MB (yarn.nodemanager.resource.memory-mb) for allocating containers.
If you run map task, you need 768 MB (mapreduce.map.memory.mb). If you run reduce task, you need 1024 MB (mapreduce.reduce.memory.mb). If you run the MapReduce app master, you need 1024 MB (yarn.app.mapreduce.am .resource.mb). Therefore, you run MapReduce job, you can run only 2 containers per NodeManager (3 x 768 = 2304 < 2048) on your setup. 2014-09-12 20:37 GMT+02:00 Jakub Stransky <[email protected]>: > I thought that memory assigned has to be muliply of > yarn.scheduler.minimum-allocation-mb and is rounded according that. > That's right. It also specifies the minimum size of a container to prevent from requesting unreasonable small containers (that are likely to cause tasks failures). > > any other I am not aware of. Are there any additional parameters like that > you mentioned which should be set? > There are also settings related to vcores in mapred-site.xml and yarn-site.xml. But they don't change anything in your case (as you are limited by the memory, not vcores). > The job wasn't the smallest but wasn't PB of data. Was run on 1.5GB of > data and run for 60min. I wasn't able to make any significant improvment. > It is map only job. And wasn't able to achive more that 30% of total > machine cpu utilization. Howewer top command were displaying 100 %cpu for > process running on data node, that's why I was thinking that way about > limit on container process limit. I didn't find any other boundary like io > or network or memory. > CPU utilization depends on type of your jobs (e.g. doing complex math operations or just counting words) and the number of containers you run. If you want to play with this, you can run more CPU-bound jobs or increase the number of containers running on a node.
