[
https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264079#comment-15264079
]
Jason Lowe commented on YARN-4280:
----------------------------------
I'm not thrilled with the idea of preemption to solve this issue. Nothing
should have to be shot (i.e.: work lost) to solve this problem. The real issue
is that we are _not_ placing a reservation and allowing further containers to
be allocated.
To really solve it without resorting to shooting containers we need to allow
reservations to exceed the cluster or queue capacity. As a user I should be
able to reserve up to my user limit, which already happens today as long as the
queue/cluster limit isn't hit. If we just allowed a reservation of at least
one container beyond the cluster/queue limit (as long as it's below the
user-limit) then the application would make progress and it should solve this
particular issue. Yes, this would mean that used + reserved could be > total
capacity, but without it we are allowing apps to starve indefinitely.
> CapacityScheduler reservations may not prevent indefinite postponement on a
> busy cluster
> ----------------------------------------------------------------------------------------
>
> Key: YARN-4280
> URL: https://issues.apache.org/jira/browse/YARN-4280
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Affects Versions: 2.6.1, 2.8.0, 2.7.1
> Reporter: Kuhu Shukla
> Assignee: Kuhu Shukla
>
> Consider the following scenario:
> There are 2 queues A(25% of the total capacity) and B(75%), both can run at
> total cluster capacity. There are 2 applications, appX that runs on Queue A,
> always asking for 1G containers(non-AM) and appY runs on Queue B asking for 2
> GB containers.
> The user limit is high enough for the application to reach 100% of the
> cluster resource.
> appX is running at total cluster capacity, full with 1G containers releasing
> only one container at a time. appY comes in with a request of 2GB container
> but only 1 GB is free. Ideally, since appY is in the underserved queue, it
> has higher priority and should reserve for its 2 GB request. Since this
> request puts the alloc+reserve above total capacity of the cluster,
> reservation is not made. appX comes in with a 1GB request and since 1GB is
> still available, the request is allocated.
> This can continue indefinitely causing priority inversion.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]