Yang Wang created YARN-6966:
-------------------------------

             Summary: NodeManager metrics may returning wrong negative values 
when after restart
                 Key: YARN-6966
                 URL: https://issues.apache.org/jira/browse/YARN-6966
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Yang Wang


Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
The primary cause of negative values is that metrics do not recover properly 
when NM restart.
AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
 in metrics also need to recover when NM restart.
This should be done in ContainerManagerImpl#recoverContainer.

The scenario could be reproduction by the following steps:
# Make sure 
YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
 in NM
# Submit an application and keep running
# Restart NM
# Stop the application
# Now you get the negative values
{code}
/jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
{code}
{code}
{
name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
modelerType: "NodeManagerMetrics",
tag.Context: "yarn",
tag.Hostname: "hadoop1111.com",
ContainersLaunched: 0,
ContainersCompleted: 0,
ContainersFailed: 2,
ContainersKilled: 0,
ContainersIniting: 0,
ContainersRunning: 0,
AllocatedGB: 0,
AllocatedContainers: -2,
AvailableGB: 160,
AllocatedVCores: -11,
AvailableVCores: 3611,
ContainerLaunchDurationNumOps: 2,
ContainerLaunchDurationAvgTime: 6,
BadLocalDirs: 0,
BadLogDirs: 0,
GoodLocalDirsDiskUtilizationPerc: 2,
GoodLogDirsDiskUtilizationPerc: 2
}
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to