Hi Adam, thanks for your response. I thought that memory assigned has to be muliply of yarn.scheduler.minimum-allocation-mb and is rounded according that.
I am seetting just those properties mentioned that means # hadoop - yarn-site.xml yarn.nodemanager.resource.memory-mb : 2048 yarn.scheduler.minimum-allocation-mb : 256 yarn.scheduler.maximum-allocation-mb : 2048 # hadoop - mapred-site.xml mapreduce.map.memory.mb : 768 mapreduce.map.java.opts : -Xmx512m mapreduce.reduce.memory.mb : 1024 mapreduce.reduce.java.opts : -Xmx768m mapreduce.task.io.sort.mb : 100 yarn.app.mapreduce.am.resource.mb : 1024 yarn.app.mapreduce.am.command-opts : -Xmx768m any other I am not aware of. Are there any additional parameters like that you mentioned which should be set? The job wasn't the smallest but wasn't PB of data. Was run on 1.5GB of data and run for 60min. I wasn't able to make any significant improvment. It is map only job. And wasn't able to achive more that 30% of total machine cpu utilization. Howewer top command were displaying 100 %cpu for process running on data node, that's why I was thinking that way about limit on container process limit. I didn't find any other boundary like io or network or memory. Thanks for any help or clarification Jakub On 12 September 2014 18:23, Adam Kawa <[email protected]> wrote: > Hi, > > With these settings, your are able to start 2 containers maximally per > NodeManager (yarn.nodemanager.resource.memory-mb = 2048). The size of > your containers is between 768 - 1024 MBs (not sure what is your value of > yarn.nodemanager.resource.cpu-vcores). > Have you tried to run more (or bigger) jobs on the cluster concurrently? > Then you might see higher CPU utilization than 30%. > > Cheers! > Adam > > 2014-09-12 17:51 GMT+02:00 Jakub Stransky <[email protected]>: > >> Hello experienced hadoop users, >> >> I have one beginners question regarding cpu utilization on datanodes when >> running MR job. Cluster of 5 machines, 2NN +3 DN really inexpensive hw >> using following parameters: >> # hadoop - yarn-site.xml >> yarn.nodemanager.resource.memory-mb : 2048 >> yarn.scheduler.minimum-allocation-mb : 256 >> yarn.scheduler.maximum-allocation-mb : 2048 >> >> # hadoop - mapred-site.xml >> mapreduce.map.memory.mb : 768 >> mapreduce.map.java.opts : -Xmx512m >> mapreduce.reduce.memory.mb : 1024 >> mapreduce.reduce.java.opts : -Xmx768m >> mapreduce.task.io.sort.mb : 100 >> yarn.app.mapreduce.am.resource.mb : 1024 >> yarn.app.mapreduce.am.command-opts : -Xmx768m >> >> and I have map only task which uses 3 mappers which are essentially >> distributed across the cluster - 1 task per dn. What I see on the cluster >> nodes is that cpu utilization doesn't overcome 30%. >> >> Am I right and hadoop do really limit all the resources per container >> bases? I wasn't able to find any command/setting which would prove this >> theory. ulimit for yarn were unlimited, etc. >> >> Not sure if I am missing something here >> >> Thanks for providing more insight into resource planning and utilization >> Jakub >> >> >> >> >> >> > -- Jakub Stransky cz.linkedin.com/in/jakubstransky
