[
https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267623#comment-15267623
]
Wangda Tan commented on YARN-4280:
----------------------------------
Thanks [~jlowe], your comments are all make sense to me.
[~kshukla], some thoughts (majorly brain dump) might help your prototype:
- Existing reserved container allocation is go to leaf queue directly, which
assumes no more checks required in parent queues), probably we should add some
fields to parent queues (like which leaf queue has reserved resources) and do
allocation from top to bottom for reserved containers.
- If we allows reserve resource beyond queue's max capacity, we should consider
how to show that to users as well. Because user could ask questions like (why
my queue can use more than total of cluster/queue-max resource).
- To solve above issue, a different approach is gradually reserve resource,
IAW, adding a "Reserving" state to container. For example, if a request needs
2G, and only 1G available in the cluster, it will reserve 1G first, and after a
while, another 1G resource available, scheduler can finally set container state
from reserving to reserved.
> CapacityScheduler reservations may not prevent indefinite postponement on a
> busy cluster
> ----------------------------------------------------------------------------------------
>
> Key: YARN-4280
> URL: https://issues.apache.org/jira/browse/YARN-4280
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Affects Versions: 2.6.1, 2.8.0, 2.7.1
> Reporter: Kuhu Shukla
> Assignee: Kuhu Shukla
>
> Consider the following scenario:
> There are 2 queues A(25% of the total capacity) and B(75%), both can run at
> total cluster capacity. There are 2 applications, appX that runs on Queue A,
> always asking for 1G containers(non-AM) and appY runs on Queue B asking for 2
> GB containers.
> The user limit is high enough for the application to reach 100% of the
> cluster resource.
> appX is running at total cluster capacity, full with 1G containers releasing
> only one container at a time. appY comes in with a request of 2GB container
> but only 1 GB is free. Ideally, since appY is in the underserved queue, it
> has higher priority and should reserve for its 2 GB request. Since this
> request puts the alloc+reserve above total capacity of the cluster,
> reservation is not made. appX comes in with a 1GB request and since 1GB is
> still available, the request is allocated.
> This can continue indefinitely causing priority inversion.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]