Re: CPU utilization

Adam Kawa Fri, 12 Sep 2014 11:59:23 -0700

Your NodeManager can use 2048 MB (yarn.nodemanager.resource.memory-mb) for
allocating containers.


If you run map task, you need 768 MB (mapreduce.map.memory.mb).
If you run reduce task, you need 1024 MB (mapreduce.reduce.memory.mb).
If you run the MapReduce app master, you need 1024 MB (yarn.app.mapreduce.am
.resource.mb).

Therefore, you run MapReduce job, you can run only 2 containers per
NodeManager (3 x 768 = 2304 < 2048) on your setup.

2014-09-12 20:37 GMT+02:00 Jakub Stransky <[email protected]>:


>  I thought that memory assigned has to be muliply of
> yarn.scheduler.minimum-allocation-mb and is rounded according that.
>

That's right. It also specifies the minimum size of a container to prevent
from requesting unreasonable small containers (that are likely to cause
tasks failures).

>
> any other I am not aware of. Are there any additional parameters like that
> you mentioned which should be set?
>

There are also settings related to vcores in mapred-site.xml and
yarn-site.xml. But they don't change anything in your case (as you are
limited by the memory, not vcores).


> The job wasn't the smallest but wasn't PB of data. Was run on 1.5GB of
> data and run for 60min. I wasn't able to make any significant improvment.
> It is map only job. And wasn't able to achive more that 30% of total
> machine cpu utilization. Howewer top command were displaying 100 %cpu for
> process running on data node, that's why I was thinking that way about
> limit on container process limit. I didn't find any other boundary like io
> or network or memory.
>

CPU utilization depends on type of your jobs (e.g. doing complex math
operations or just counting words) and the number of containers you run. If
you want to play with this, you can run more CPU-bound jobs or increase the
number of containers running on a node.

Re: CPU utilization

Reply via email to