Hi all, Using Hadoop 2.7.2. Wondering if anyone's seen an issue before where every once in a while the resource manager gets into a weird state where the Applications dashboard shows jobs running, but there are no actual jobs running on the cluster. When this happens we'll see RM cpu usage flat-lining at very high levels (around 85%), but the datanodes/nodemanagers will have no load because of no jobs running. If we restart the RM and let it fail over to the stand-by, the cluster will go back to normal behavior and start running jobs again after 15-30 minutes.
Bit of a strange situation - not entirely sure why the RM would fail to realize that the jobs have finished running and that the jobs sitting in accepted state are free to run. Also strange that the RM gets stuck at high cpu usage. If anyone can point me in the right direction on how to debug or resolve this, that would be much appreciated! -- George A. Liaw (408) 318-7920 george.a.l...@gmail.com LinkedIn <http://www.linkedin.com/in/georgeliaw/>