[
https://issues.apache.org/jira/browse/YARN-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335605#comment-17335605
]
Íñigo Goiri commented on YARN-10760:
------------------------------------
Thanks [~afchung90], could you create a PR for this?
> Number of allocated OPPORTUNISTIC containers can dip below 0
> ------------------------------------------------------------
>
> Key: YARN-10760
> URL: https://issues.apache.org/jira/browse/YARN-10760
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 3.1.2
> Reporter: Andrew Chung
> Assignee: Andrew Chung
> Priority: Minor
>
> {{AbstractYarnScheduler.completedContainers}} can potentially be called from
> multiple sources, yet it appears that there are scenarios in which the caller
> does not hold the appropriate lock, which can lead to the count of
> {{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0.
> To prevent double counting when releasing allocated O containers, a simple
> fix might be to check if the {{RMContainer}} has already been removed
> beforehand, though that may not fix the underlying issue that causes the race
> condition.
> Following is "capture" of
> {{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0 via a
> JMX query:
> {noformat}
> {
> "name" :
> "Hadoop:service=ResourceManager,name=OpportunisticSchedulerMetrics",
> "modelerType" : "OpportunisticSchedulerMetrics",
> "tag.OpportunisticSchedulerMetrics" : "ResourceManager",
> "tag.Context" : "yarn",
> "tag.Hostname" : "",
> "AllocatedOContainers" : -2716,
> "AggregateOContainersAllocated" : 306020,
> "AggregateOContainersReleased" : 308736,
> "AggregateNodeLocalOContainersAllocated" : 0,
> "AggregateRackLocalOContainersAllocated" : 0,
> "AggregateOffSwitchOContainersAllocated" : 306020,
> "AllocateLatencyOQuantilesNumOps" : 0,
> "AllocateLatencyOQuantiles50thPercentileTime" : 0,
> "AllocateLatencyOQuantiles75thPercentileTime" : 0,
> "AllocateLatencyOQuantiles90thPercentileTime" : 0,
> "AllocateLatencyOQuantiles95thPercentileTime" : 0,
> "AllocateLatencyOQuantiles99thPercentileTime" : 0
> }
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]