Tao Yang created YARN-8771:
------------------------------
Summary: CapacityScheduler fails to unreserve when cluster
resource contains empty resource type
Key: YARN-8771
URL: https://issues.apache.org/jira/browse/YARN-8771
Project: Hadoop YARN
Issue Type: Bug
Components: capacityscheduler
Affects Versions: 3.2.0
Reporter: Tao Yang
Assignee: Tao Yang
We found this problem when cluster is almost but not exhausted (93% used),
scheduler kept allocating for an app but always fail to commit, this can
blocking requests from other apps and parts of cluster resource can't be used.
Reproduce this problem:
(1) use DominantResourceCalculator
(2) cluster resource has empty resource type, for example: gpu=0
(3) scheduler allocates container for app1 who has reserved containers and
whose queue limit or user limit reached(used + required > limit).
Reference codes in RegularContainerAllocator#assignContainer:
{code:java}
boolean needToUnreserve =
Resources.greaterThan(rc, clusterResource,
resourceNeedToUnReserve, Resources.none());
{code}
value of resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu>, result of
{{Resources#greaterThan}} will be false if using DominantResourceCalculator.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]