[
https://issues.apache.org/jira/browse/YARN-9432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835257#comment-16835257
]
Hudson commented on YARN-9432:
------------------------------
SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16522 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/16522/])
YARN-9432. Reserved containers leak after its request has been cancelled (wwei:
rev c336af3847add969303c95ea5af2fb76e0c086ab)
* (edit)
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* (edit)
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerMultiNodes.java
> Reserved containers leak after its request has been cancelled or satisfied
> when multi-nodes enabled
> ---------------------------------------------------------------------------------------------------
>
> Key: YARN-9432
> URL: https://issues.apache.org/jira/browse/YARN-9432
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacityscheduler
> Reporter: Tao Yang
> Assignee: Tao Yang
> Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9432.001.patch, YARN-9432.002.patch,
> YARN-9432.003.patch, YARN-9432.004.patch
>
>
> Reserved containers may change to be excess after its request has been
> cancelled or satisfied, excess reserved containers need to be unreserved
> quickly to release resource for others.
> For multi-nodes disabled scenario, excess reserved containers can be quickly
> released in next node heartbeat, the calling stack is
> CapacityScheduler#nodeUpdate --> CapacityScheduler#allocateContainersToNode
> --> CapacityScheduler#allocateContainerOnSingleNode.
> But for multi-nodes enabled scenario, excess reserved containers have chance
> to be released only in allocation process, key phase of the calling stack is
> LeafQueue#assignContainers --> LeafQueue#allocateFromReservedContainer.
> According to this, excess reserved containers may not be released until its
> queue has pending request and has chance to be allocated, and the worst is
> that excess reserved containers will never be released and keep holding
> resource if there is no additional pending request for this queue.
> To solve this problem, my opinion is to directly kill excess reserved
> containers when request is satisfied (in FiCaSchedulerApp#apply) or the
> allocation number of resource-requests/scheduling-requests is updated to be 0
> (in SchedulerApplicationAttempt#updateResourceRequests /
> SchedulerApplicationAttempt#updateSchedulingRequests).
> Please feel free to give your suggestions. Thanks.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]