[
https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tao Yang updated YARN-8771:
---------------------------
Attachment: YARN-8771.001.patch
> CapacityScheduler fails to unreserve when cluster resource contains empty
> resource type
> ---------------------------------------------------------------------------------------
>
> Key: YARN-8771
> URL: https://issues.apache.org/jira/browse/YARN-8771
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacityscheduler
> Affects Versions: 3.2.0
> Reporter: Tao Yang
> Assignee: Tao Yang
> Priority: Critical
> Attachments: YARN-8771.001.patch
>
>
> We found this problem when cluster is almost but not exhausted (93% used),
> scheduler kept allocating for an app but always fail to commit, this can
> blocking requests from other apps and parts of cluster resource can't be used.
> Reproduce this problem:
> (1) use DominantResourceCalculator
> (2) cluster resource has empty resource type, for example: gpu=0
> (3) scheduler allocates container for app1 who has reserved containers and
> whose queue limit or user limit reached(used + required > limit).
> Reference codes in RegularContainerAllocator#assignContainer:
> {code:java}
> boolean needToUnreserve =
> Resources.greaterThan(rc, clusterResource,
> resourceNeedToUnReserve, Resources.none());
> {code}
> value of resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu>, result of
> {{Resources#greaterThan}} will be false if using DominantResourceCalculator.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]