[
https://issues.apache.org/jira/browse/YARN-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17426215#comment-17426215
]
Eric Payne commented on YARN-8546:
----------------------------------
[~Tao Yang], [~cheersyang], if there are no objections, I'll go ahead and
backport this to branch-2.10 withe the changes to the unit test.
> Resource leak caused by a reserved container being released more than once
> under async scheduling
> -------------------------------------------------------------------------------------------------
>
> Key: YARN-8546
> URL: https://issues.apache.org/jira/browse/YARN-8546
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: capacity scheduler
> Affects Versions: 3.1.0
> Reporter: Weiwei Yang
> Assignee: Tao Yang
> Priority: Major
> Labels: global-scheduling
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8546.001.patch, YARN-8546.branch-2.10.001.patch
>
>
> I was able to reproduce this issue by starting a job, and this job keeps
> requesting containers until it uses up cluster available resource. My cluster
> has 70200 vcores, and each task it applies for 100 vcores, I was expecting
> total 702 containers can be allocated but eventually there was only 701. The
> last container could not get allocated because queue used resource is updated
> to be more than 100%.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]