Hi,
I'm building a system to monitor my hadoop cluster, I can get metrics about the cluster via hadoop metrics(https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/Metrics.html?spm=5176.2020520111.111.1.278ad103oLtdlm#NodeManagerMetrics): ClusterMetrics ClusterMetrics shows the metrics of the YARN cluster from the ResourceManager’s perspective. Each metrics record contains Hostname tag as additional information along with metrics. Name Description NumActiveNMs Current number of active NodeManagers NumDecommissionedNMs Current number of decommissioned NodeManagers NumLostNMs Current number of lost NodeManagers for not sending heartbeats NumUnhealthyNMs Current number of unhealthy NodeManagers NumRebootedNMs Current number of rebooted NodeManagers How can I find out which nodemangers are unhealthy and which are lost? Better if it could be achieved by calling jmx rest api or hadoop command. Any suggestions are appreciated, thank you. HUANG