Andrew Chung created YARN-10760:
-----------------------------------
Summary: Number of allocated OPPORTUNISTIC containers can dip
below 0
Key: YARN-10760
URL: https://issues.apache.org/jira/browse/YARN-10760
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Reporter: Andrew Chung
{{AbstractYarnScheduler.completedContainers}} can potentially be called from
multiple sources, yet it appears that there are scenarios in which the caller
does not hold the appropriate lock, which can lead to the count of
{{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0.
To prevent double counting when releasing allocated O containers, a simple fix
might be to check if the {{RMContainer}} has already been removed beforehand,
though that may not fix the underlying issue that causes the race condition.
Following is a screenshot of
{{OpportunisticSchedulerMetrics.AllocatedOContainers}} falling below 0 via a
JMX query:
{noformat}
{
"name" :
"Hadoop:service=ResourceManager,name=OpportunisticSchedulerMetrics",
"modelerType" : "OpportunisticSchedulerMetrics",
"tag.OpportunisticSchedulerMetrics" : "ResourceManager",
"tag.Context" : "yarn",
"tag.Hostname" : "",
"AllocatedOContainers" : -2716,
"AggregateOContainersAllocated" : 306020,
"AggregateOContainersReleased" : 308736,
"AggregateNodeLocalOContainersAllocated" : 0,
"AggregateRackLocalOContainersAllocated" : 0,
"AggregateOffSwitchOContainersAllocated" : 306020,
"AllocateLatencyOQuantilesNumOps" : 0,
"AllocateLatencyOQuantiles50thPercentileTime" : 0,
"AllocateLatencyOQuantiles75thPercentileTime" : 0,
"AllocateLatencyOQuantiles90thPercentileTime" : 0,
"AllocateLatencyOQuantiles95thPercentileTime" : 0,
"AllocateLatencyOQuantiles99thPercentileTime" : 0
}
{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]