Hello experienced hadoop users, I have one beginners question regarding cpu utilization on datanodes when running MR job. Cluster of 5 machines, 2NN +3 DN really inexpensive hw using following parameters: # hadoop - yarn-site.xml yarn.nodemanager.resource.memory-mb : 2048 yarn.scheduler.minimum-allocation-mb : 256 yarn.scheduler.maximum-allocation-mb : 2048
# hadoop - mapred-site.xml mapreduce.map.memory.mb : 768 mapreduce.map.java.opts : -Xmx512m mapreduce.reduce.memory.mb : 1024 mapreduce.reduce.java.opts : -Xmx768m mapreduce.task.io.sort.mb : 100 yarn.app.mapreduce.am.resource.mb : 1024 yarn.app.mapreduce.am.command-opts : -Xmx768m and I have map only task which uses 3 mappers which are essentially distributed across the cluster - 1 task per dn. What I see on the cluster nodes is that cpu utilization doesn't overcome 30%. Am I right and hadoop do really limit all the resources per container bases? I wasn't able to find any command/setting which would prove this theory. ulimit for yarn were unlimited, etc. Not sure if I am missing something here Thanks for providing more insight into resource planning and utilization Jakub
