Hi We are running some test cases in parallel using maven fail-safe plugin. Most of the test cases submit MR/TEZ/oozie/hive jobs. After some time test cases are getting stuck. When checked the job tracker, we noticed that there are some jobs stuck in RUNNING state and some jobs SUBMITED but not getting chance to run. Following are the details, Hadoop Version: 2.4.1 Nodes available: 3 Yarn scheduler used: capacity scheduler Configuration in the yarn-site.xml: yarn.nodemanager.resource.memory-mb 4096 yarn.scheduler.minimum-allocation-mb 512 mapreduce.map.memory.mb 1536 mapreduce.reduce.memory.mb 2560 mapreduce.map.java.opts -Xmx512m mapreduce.reduce.java.opts -Xmx512m yarn.nodemanager.vmem-pmem-ratio 5
Following was resources available when stuck(there were sufficient CPU-Capacity available), Case-1: Node-1 :2.5GB Node-2: 1.5GB Node-3: 2.5GB Case-2 Node-1 : 0 Node-2: 0 Node-3: 4GB We are not able to find, Why RUNNING jobs are not getting completed even some resources are available? Why available resources are not getting used by YARN to complete the running jobs? Is there any case that required resources for running jobs to complete is more than available that's why jobs are not getting completed. If this is the case, how we can find out required resources/container for any jobs? Is there any other properties we should check ? Any pointer/help on this would be very helpful. Thanks in advance. Ravi DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.