[
https://issues.apache.org/jira/browse/YARN-9432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tao Yang updated YARN-9432:
---------------------------
Attachment: YARN-9432.003.patch
> Reserved containers leak after its request has been cancelled or satisfied
> when multi-nodes enabled
> ---------------------------------------------------------------------------------------------------
>
> Key: YARN-9432
> URL: https://issues.apache.org/jira/browse/YARN-9432
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacityscheduler
> Reporter: Tao Yang
> Assignee: Tao Yang
> Priority: Major
> Attachments: YARN-9432.001.patch, YARN-9432.002.patch,
> YARN-9432.003.patch
>
>
> Reserved containers may change to be excess after its request has been
> cancelled or satisfied, excess reserved containers need to be unreserved
> quickly to release resource for others.
> For multi-nodes disabled scenario, excess reserved containers can be quickly
> released in next node heartbeat, the calling stack is
> CapacityScheduler#nodeUpdate --> CapacityScheduler#allocateContainersToNode
> --> CapacityScheduler#allocateContainerOnSingleNode.
> But for multi-nodes enabled scenario, excess reserved containers have chance
> to be released only in allocation process, key phase of the calling stack is
> LeafQueue#assignContainers --> LeafQueue#allocateFromReservedContainer.
> According to this, excess reserved containers may not be released until its
> queue has pending request and has chance to be allocated, and the worst is
> that excess reserved containers will never be released and keep holding
> resource if there is no additional pending request for this queue.
> To solve this problem, my opinion is to directly kill excess reserved
> containers when request is satisfied (in FiCaSchedulerApp#apply) or the
> allocation number of resource-requests/scheduling-requests is updated to be 0
> (in SchedulerApplicationAttempt#updateResourceRequests /
> SchedulerApplicationAttempt#updateSchedulingRequests).
> Please feel free to give your suggestions. Thanks.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]