[
https://issues.apache.org/jira/browse/YARN-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949330#comment-16949330
]
Tao Yang commented on YARN-9838:
--------------------------------
Thanks [~jiulongZhu] for fixing this issue.
The patch is LGTM in general, some minor suggestions for the patch:
* check-style warnings need to be fixed, after that, you can run
"dev-support/bin/test-patch /path/to/my.patch" to confirm.
* The indentation of updated log need to be adjusted and useless deletion of a
blank line should be reverted in LeafQueue.
* The annotation "sync ResourceUsageByLabel ResourceUsageByUser and
numContainer" can be removed since it seems unnecessary to add details here.
* As for UT, you can remove before-fixed block and just keep the correct
verification. Moreover, I think it's better to remove "//YARN-9838" since we
can find the source easily by git, and the annotation style "/** */" often used
for class or method, it's better to use "//" or "/* */" in the method.
> Using the CapacityScheduler,Apply "movetoqueue" on the application which CS
> reserved containers for,will cause "Num Container" and "Used Resource" in
> ResourceUsage metrics error
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-9838
> URL: https://issues.apache.org/jira/browse/YARN-9838
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: capacity scheduler
> Affects Versions: 2.7.3
> Reporter: jiulongzhu
> Priority: Critical
> Labels: patch
> Fix For: 2.7.3
>
> Attachments: RM_UI_metric_negative.png, RM_UI_metric_positive.png,
> YARN-9838.0001.patch
>
>
> In some clusters of ours, we are seeing "Used Resource","Used
> Capacity","Absolute Used Capacity" and "Num Container" is positive or
> negative when the queue is absolutely idle(no RUNNING, no NEW apps...).In
> extreme cases, apps couldn't be submitted to the queue that is actually idle
> but the "Used Resource" is far more than zero, just like "Container Leak".
> Firstly,I found that "Used Resource","Used Capacity" and "Absolute Used
> Capacity" use the "Used" value of ResourceUsage kept by AbstractCSQueue, and
> "Num Container" use the "numContainer" value kept by LeafQueue.And
> AbstractCSQueue#allocateResource and AbstractCSQueue#releaseResource will
> change the state value of "numContainer" and "Used". Secondly, by comparing
> the values numContainer and ResourceUsageByLabel and QueueMetrics
> changed(#allocateContainer and #releaseContainer) logic of applications with
> and without "movetoqueue",i found that moving the reservedContainers didn't
> modify the "numContainer" value in AbstractCSQueue and "used" value in
> ResourceUsage when the application was moved from a queue to another queue.
> The metric values changed logic of reservedContainers are allocated,
> and moved from $FROM queue to $TO queue, and released.The degree of increase
> and decrease is not conservative, the Resource allocated from $FROM queue and
> release to $TO queue.
> ||move reversedContainer||allocate||movetoqueue||release||
> |numContainer|increase in $FROM queue|{color:#FF0000}$FROM queue stay the
> same,$TO queue stay the same{color}|decrease in $TO queue|
> |ResourceUsageByLabel(USED)|increase in $FROM queue|{color:#FF0000}$FROM
> queue stay the same,$TO queue stay the same{color}|decrease in $TO queue |
> |QueueMetrics|increase in $FROM queue|decrease in $FROM queue, increase in
> $TO queue|decrease in $TO queue|
> The metric values changed logic of allocatedContainer(allocated,
> acquired, running) are allocated, and movetoqueue, and released are
> absolutely conservative.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]