[ https://issues.apache.org/jira/browse/YARN-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17340276#comment-17340276 ]
Andrew Chung commented on YARN-10760: ------------------------------------- [~elgoiri] My purported fix did not work and we are still observing the issue in production. May need further investigation. > Number of allocated OPPORTUNISTIC containers can dip below 0 > ------------------------------------------------------------ > > Key: YARN-10760 > URL: https://issues.apache.org/jira/browse/YARN-10760 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 3.1.2 > Reporter: Andrew Chung > Assignee: Andrew Chung > Priority: Minor > > {{AbstractYarnScheduler.completedContainers}} can potentially be called from > multiple sources, yet it appears that there are scenarios in which the caller > does not hold the appropriate lock, which can lead to the count of > {{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0. > To prevent double counting when releasing allocated O containers, a simple > fix might be to check if the {{RMContainer}} has already been removed > beforehand, though that may not fix the underlying issue that causes the race > condition. > Following is "capture" of > {{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0 via a > JMX query: > {noformat} > { > "name" : > "Hadoop:service=ResourceManager,name=OpportunisticSchedulerMetrics", > "modelerType" : "OpportunisticSchedulerMetrics", > "tag.OpportunisticSchedulerMetrics" : "ResourceManager", > "tag.Context" : "yarn", > "tag.Hostname" : "", > "AllocatedOContainers" : -2716, > "AggregateOContainersAllocated" : 306020, > "AggregateOContainersReleased" : 308736, > "AggregateNodeLocalOContainersAllocated" : 0, > "AggregateRackLocalOContainersAllocated" : 0, > "AggregateOffSwitchOContainersAllocated" : 306020, > "AllocateLatencyOQuantilesNumOps" : 0, > "AllocateLatencyOQuantiles50thPercentileTime" : 0, > "AllocateLatencyOQuantiles75thPercentileTime" : 0, > "AllocateLatencyOQuantiles90thPercentileTime" : 0, > "AllocateLatencyOQuantiles95thPercentileTime" : 0, > "AllocateLatencyOQuantiles99thPercentileTime" : 0 > } > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org