Hi, With these settings, your are able to start 2 containers maximally per NodeManager (yarn.nodemanager.resource.memory-mb = 2048). The size of your containers is between 768 - 1024 MBs (not sure what is your value of yarn.nodemanager.resource.cpu-vcores). Have you tried to run more (or bigger) jobs on the cluster concurrently? Then you might see higher CPU utilization than 30%.
Cheers! Adam 2014-09-12 17:51 GMT+02:00 Jakub Stransky <[email protected]>: > Hello experienced hadoop users, > > I have one beginners question regarding cpu utilization on datanodes when > running MR job. Cluster of 5 machines, 2NN +3 DN really inexpensive hw > using following parameters: > # hadoop - yarn-site.xml > yarn.nodemanager.resource.memory-mb : 2048 > yarn.scheduler.minimum-allocation-mb : 256 > yarn.scheduler.maximum-allocation-mb : 2048 > > # hadoop - mapred-site.xml > mapreduce.map.memory.mb : 768 > mapreduce.map.java.opts : -Xmx512m > mapreduce.reduce.memory.mb : 1024 > mapreduce.reduce.java.opts : -Xmx768m > mapreduce.task.io.sort.mb : 100 > yarn.app.mapreduce.am.resource.mb : 1024 > yarn.app.mapreduce.am.command-opts : -Xmx768m > > and I have map only task which uses 3 mappers which are essentially > distributed across the cluster - 1 task per dn. What I see on the cluster > nodes is that cpu utilization doesn't overcome 30%. > > Am I right and hadoop do really limit all the resources per container > bases? I wasn't able to find any command/setting which would prove this > theory. ulimit for yarn were unlimited, etc. > > Not sure if I am missing something here > > Thanks for providing more insight into resource planning and utilization > Jakub > > > > > >
