[
https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337163#comment-15337163
]
Wangda Tan commented on YARN-4280:
----------------------------------
Thanks [~kshukla] for working on this patch.
My understanding of how this patch works, please correct me if I'm wrong:
When allocating:
- For application: if an app has pending resources, and don't have enough
headroom, app returns CSAssignment with blockedResource > 0
- For leaf queue: if a leaf queue gets CSAssignment with blockedRsource > 0, it
set queue.blockedResource accordingly.
- For parent queue: blocks same amount of leaf queue when used + blocked > limit
When releasing:
- For queues: blockedResource will be deducted when release of container
happens.
Problems that I can see from the patch:
- Entire queue will be skipped when it has an application fails the
checkHeadroom:
{code}
if (!checkHeadroom(clusterResource, resourceLimits, required, node)) {
if (LOG.isDebugEnabled()) {
LOG.debug("cannot allocate required resource=" + required
+ " because of headroom");
}
return new ContainerAllocation(null, null, required,
AllocationState.QUEUE_SKIPPED);
}
{code}
- Resources are blocked without checking if the container could be allocated.
(For example, request hard locality, which relaxLocality set to false, and the
host is invalid)
- Leaf queue can block resource to more than its maximum capacity. (A queue has
max = 100%, used = 95%, it can block 10% more resource).
- Queue's blocked resource will not be updated after application finished.
- Blocked resource will be cleared if the first app doesn't need blocked
resource.
{code}
Resource blkedResource = assignment.getBlockedRequestResource();
if(blkedResource !=null) {
// ...
} else {
assignment.setBlockedRequestResource(null);
queueUsage.setBlocked(node.getPartition(), Resources.none());
}
{code}
- Ordering of queues should be updated after resource blocked (In
PartitionedQueueComparator)
I would suggest to at least handle following cases in the patch:
1. Only block resources when allocation finishes: leaf-queue.used < max and any
of its parents > max.
2. Do not block resources unless the resource can be used by app (Counter
example is hard locality case I mentioned above)
3. Blocked resources will be cleared properly after completion of application
or updating of resource request
4. Updating of blocked resources should reflect to ordering of queues.
Considering issues in the patch, instead of adding the new blocked resource to
queue, I think you could reconsider to leverage reserved container mechanism.
We may need add some extra logic to check if a reserved container has
guaranteed resource (i.e allocate the reserved container when parent queues max
resource will not be violated). But all the other stuffs should be kept to same.
> CapacityScheduler reservations may not prevent indefinite postponement on a
> busy cluster
> ----------------------------------------------------------------------------------------
>
> Key: YARN-4280
> URL: https://issues.apache.org/jira/browse/YARN-4280
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Affects Versions: 2.6.1, 2.8.0, 2.7.1
> Reporter: Kuhu Shukla
> Assignee: Kuhu Shukla
> Attachments: YARN-4280.001.patch, YARN-4280.002.patch
>
>
> Consider the following scenario:
> There are 2 queues A(25% of the total capacity) and B(75%), both can run at
> total cluster capacity. There are 2 applications, appX that runs on Queue A,
> always asking for 1G containers(non-AM) and appY runs on Queue B asking for 2
> GB containers.
> The user limit is high enough for the application to reach 100% of the
> cluster resource.
> appX is running at total cluster capacity, full with 1G containers releasing
> only one container at a time. appY comes in with a request of 2GB container
> but only 1 GB is free. Ideally, since appY is in the underserved queue, it
> has higher priority and should reserve for its 2 GB request. Since this
> request puts the alloc+reserve above total capacity of the cluster,
> reservation is not made. appX comes in with a 1GB request and since 1GB is
> still available, the request is allocated.
> This can continue indefinitely causing priority inversion.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]