[
https://issues.apache.org/jira/browse/YARN-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
jiulongzhu updated YARN-9838:
-----------------------------
Attachment: (was: bug_fix_capacityScheduler_moveApplication.patch)
> Using the CapacityScheduler,Apply "movetoqueue" on the application which CS
> reserved containers for,will cause "Num Container" and "Used Resource" in
> ResourceUsage metrics error
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-9838
> URL: https://issues.apache.org/jira/browse/YARN-9838
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: capacity scheduler
> Affects Versions: 2.7.3
> Reporter: jiulongzhu
> Priority: Critical
> Labels: patch
> Fix For: 2.7.3
>
> Attachments: RM_UI_metric_negative.png, RM_UI_metric_positive.png
>
>
> In some clusters of ours, we are seeing "Used Resource","Used
> Capacity","Absolute Used Capacity" and "Num Container" is positive or
> negative when the queue is absolutely idle(no RUNNING, no NEW apps...).In
> extreme cases, apps couldn't be submitted to the queue that is actually idle
> but the "Used Resource" is far more than zero, just like "Container Leak".
> Firstly,I found that "Used Resource","Used Capacity" and "Absolute Used
> Capacity" use the "Used" value of ResourceUsage kept by AbstractCSQueue, and
> "Num Container" use the "numContainer" value kept by LeafQueue.And
> AbstractCSQueue#allocateResource and AbstractCSQueue#releaseResource will
> change the state value of "numContainer" and "Used". Secondly, by comparing
> the values numContainer and ResourceUsageByLabel and QueueMetrics
> changed(#allocateContainer and #releaseContainer) logic of applications with
> and without "movetoqueue",i found that moving the reservedContainers didn't
> modify the "numContainer" value in AbstractCSQueue and "used" value in
> ResourceUsage when the application was moved from a queue to another queue.
> The metric values changed logic of reservedContainers are allocated,
> and moved from $FROM queue to $TO queue, and released.The degree of increase
> and decrease is not conservative, the Resource allocated from $FROM queue and
> release to $TO queue.
> ||move reversedContainer||allocate||movetoqueue||release||
> |numContainer|increase in $FROM queue|{color:#FF0000}$FROM queue stay the
> same,$TO queue stay the same{color}|decrease in $TO queue|
> |ResourceUsageByLabel(USED)|increase in $FROM queue|{color:#FF0000}$FROM
> queue stay the same,$TO queue stay the same{color}|decrease in $TO queue |
> |QueueMetrics|increase in $FROM queue|decrease in $FROM queue, increase in
> $TO queue|decrease in $TO queue|
> The metric values changed logic of allocatedContainer(allocated,
> acquired, running) are allocated, and movetoqueue, and released are
> absolutely conservative.
>
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]