Hi all,

Using Hadoop 2.7.2.
Wondering if anyone's seen an issue before where every once in a while the
resource manager gets into a weird state where the Applications dashboard
shows jobs running, but there are no actual jobs running on the cluster.
When this happens we'll see RM cpu usage flat-lining at very high levels
(around 85%), but the datanodes/nodemanagers will have no load because of
no jobs running. If we restart the RM and let it fail over to the stand-by,
the cluster will go back to normal behavior and start running jobs again
after 15-30 minutes.

Bit of a strange situation - not entirely sure why the RM would fail to
realize that the jobs have finished running and that the jobs sitting in
accepted state are free to run. Also strange that the RM gets stuck at high
cpu usage.

If anyone can point me in the right direction on how to debug or resolve
this, that would be much appreciated!

-- 
George A. Liaw

(408) 318-7920
george.a.l...@gmail.com
LinkedIn <http://www.linkedin.com/in/georgeliaw/>

Reply via email to