[ https://issues.apache.org/jira/browse/YARN-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116452#comment-16116452 ]
Yang Wang edited comment on YARN-6212 at 8/8/17 2:25 AM: --------------------------------------------------------- Hi, [~miklos.szeg...@cloudera.com] I'm afraid this JIRA is not a duplicate of YARN-3933. The primary cause of negative values is that metrics do not recover properly when NM restart. *AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores* in metrics need to recover when NM restart. This should be done in ContainerManagerImpl#recoverContainer. The scenario could be reproduction by the following steps: # Make sure YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true in NM # Submit an application and keep running # Restart NM # Stop the application # Now you get the negative values was (Author: fly_in_gis): Hi, Miklos Szegedi I'm afraid this JIRA is not a duplicate of YARN-3933. The primary cause of negative values is that metrics do not recover properly when NM restart. *AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores* in metrics need to recover when NM restart. This should be done in ContainerManagerImpl#recoverContainer. The scenario could be reproduction by the following steps: # Make sure YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true in NM # Submit an application and keep running # Restart NM # Stop the application # Now you get the negative values > NodeManager metrics returning wrong negative values > --------------------------------------------------- > > Key: YARN-6212 > URL: https://issues.apache.org/jira/browse/YARN-6212 > Project: Hadoop YARN > Issue Type: Bug > Components: metrics > Affects Versions: 2.7.3 > Reporter: Abhishek Shivanna > > It looks like the metrics returned by the NodeManager have negative values > for metrics that never should be negative. Here is an output form NM endpoint > {noformat} > /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics > {noformat} > {noformat} > { > "beans" : [ { > "name" : "Hadoop:service=NodeManager,name=NodeManagerMetrics", > "modelerType" : "NodeManagerMetrics", > "tag.Context" : "yarn", > "tag.Hostname" : "<HOST>", > "ContainersLaunched" : 707, > "ContainersCompleted" : 9, > "ContainersFailed" : 124, > "ContainersKilled" : 579, > "ContainersIniting" : 0, > "ContainersRunning" : 19, > "AllocatedGB" : -26, > "AllocatedContainers" : -5, > "AvailableGB" : 252, > "AllocatedVCores" : -5, > "AvailableVCores" : 101, > "ContainerLaunchDurationNumOps" : 718, > "ContainerLaunchDurationAvgTime" : 18.0 > } ] > } > {noformat} > Is there any circumstance under which the value for AllocatedGB, > AllocatedContainers and AllocatedVCores go below 0? -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org