Hadoop:Runing jobs stuck even resources are available on nodes

Ravi Kumar Tue, 03 Mar 2015 23:53:59 -0800

Hi

We are running some test cases in parallel using maven fail-safe plugin.  Most 
of the test cases submit MR/TEZ/oozie/hive jobs.
After some time test cases are getting stuck. When checked the job tracker, we 
noticed that there are some jobs stuck  in RUNNING state and some jobs SUBMITED 
but not getting chance to run.
Following are the details,
Hadoop Version: 2.4.1
Nodes available: 3
Yarn scheduler used: capacity scheduler
Configuration in the yarn-site.xml:
yarn.nodemanager.resource.memory-mb           4096
yarn.scheduler.minimum-allocation-mb                512
mapreduce.map.memory.mb    1536
mapreduce.reduce.memory.mb              2560
mapreduce.map.java.opts          -Xmx512m
mapreduce.reduce.java.opts     -Xmx512m
yarn.nodemanager.vmem-pmem-ratio 5


Following was resources available when stuck(there were sufficient CPU-Capacity 
 available),
Case-1:
Node-1 :2.5GB
Node-2: 1.5GB
Node-3: 2.5GB

Case-2
Node-1 : 0
Node-2: 0
Node-3: 4GB

We are not able to find,
Why RUNNING jobs are not getting completed even some resources are available?
Why available resources are not getting used by YARN to complete the running 
jobs?
Is there any case that required resources for running jobs to complete is more 
than available that's why jobs are not getting completed.
If this is the case, how we can find out required resources/container for any 
jobs?
Is there any other properties we should check ?

Any pointer/help on this would be very helpful.

Thanks in advance.
Ravi


DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.

Hadoop:Runing jobs stuck even resources are available on nodes

Reply via email to