Re: CPU utilization

Jakub Stransky Fri, 12 Sep 2014 11:38:46 -0700

Hi Adam,

thanks for your response. I thought that memory assigned has to be muliply
of yarn.scheduler.minimum-allocation-mb and is rounded according that.


I am seetting just those properties mentioned that means
# hadoop - yarn-site.xml
yarn.nodemanager.resource.memory-mb  : 2048
yarn.scheduler.minimum-allocation-mb : 256
yarn.scheduler.maximum-allocation-mb : 2048

# hadoop - mapred-site.xml
mapreduce.map.memory.mb              : 768
mapreduce.map.java.opts              : -Xmx512m
mapreduce.reduce.memory.mb           : 1024
mapreduce.reduce.java.opts           : -Xmx768m
mapreduce.task.io.sort.mb            : 100
yarn.app.mapreduce.am.resource.mb    : 1024
yarn.app.mapreduce.am.command-opts   : -Xmx768m

any other I am not aware of. Are there any additional parameters like that
you mentioned which should be set? The job wasn't the smallest but wasn't
PB of data. Was run on 1.5GB of data and run for 60min. I wasn't able to
make any significant improvment. It is map only job. And wasn't able to
achive more that 30% of total machine cpu utilization. Howewer top command
were displaying 100 %cpu for process running on data node, that's why I was
thinking that way about limit on container process limit. I didn't find any
other boundary like io or network or memory.

Thanks for any help or clarification
Jakub


On 12 September 2014 18:23, Adam Kawa <[email protected]> wrote:

> Hi,
>
> With these settings, your are able to start 2 containers maximally per
> NodeManager (yarn.nodemanager.resource.memory-mb  = 2048). The size of
> your containers is between 768 - 1024 MBs (not sure what is your value of
> yarn.nodemanager.resource.cpu-vcores).
> Have you tried to run more (or bigger) jobs on the cluster concurrently?
> Then you might see higher CPU utilization than 30%.
>
> Cheers!
> Adam
>
> 2014-09-12 17:51 GMT+02:00 Jakub Stransky <[email protected]>:
>
>> Hello experienced hadoop users,
>>
>> I have one beginners question regarding cpu utilization on datanodes when
>> running MR job. Cluster of 5 machines, 2NN +3 DN really inexpensive hw
>> using following parameters:
>> # hadoop - yarn-site.xml
>> yarn.nodemanager.resource.memory-mb  : 2048
>> yarn.scheduler.minimum-allocation-mb : 256
>> yarn.scheduler.maximum-allocation-mb : 2048
>>
>> # hadoop - mapred-site.xml
>> mapreduce.map.memory.mb              : 768
>> mapreduce.map.java.opts              : -Xmx512m
>> mapreduce.reduce.memory.mb           : 1024
>> mapreduce.reduce.java.opts           : -Xmx768m
>> mapreduce.task.io.sort.mb            : 100
>> yarn.app.mapreduce.am.resource.mb    : 1024
>> yarn.app.mapreduce.am.command-opts   : -Xmx768m
>>
>> and I have map only task which uses 3 mappers which are essentially
>> distributed across the cluster - 1 task per dn. What I see on the cluster
>> nodes is that cpu utilization doesn't overcome 30%.
>>
>> Am I right and hadoop do really limit all the resources per container
>> bases? I wasn't able to find any command/setting which would prove this
>> theory. ulimit for yarn were unlimited, etc.
>>
>> Not sure if I am missing something here
>>
>> Thanks for providing more insight into resource planning and utilization
>> Jakub
>>
>>
>>
>>
>>
>>
>


-- 
Jakub Stransky
cz.linkedin.com/in/jakubstransky

Re: CPU utilization

Reply via email to