[jira] [Commented] (YARN-4003) ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is not consistent

Carlo Curino (JIRA) Sat, 01 Aug 2015 18:58:16 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650572#comment-14650572
 ]


Carlo Curino commented on YARN-4003:
------------------------------------

[~leftnoteasy], this sounds like a good proposal to "tighten" this limit a bit 
more. Talking with [~atumanov], however we spotted a possible corner case I 
want your opinion on.

Given a cluster of 1000 machines, and a PlanQueue of 50%. We might have the 
following set of reservations: R1 and R2 both of size 250 at a certain time 
t_0, and R3 which has size 0 at t0 (and will grow for some t_i > t_0). Formally 
the capacity of the PlanQueue (500 containers) is exhausted by R1, R2 and R3 
has capacity=0, so your math would yield to amLimit for R3 of 0 (i.e., no app 
can be started). However, R1 and R2 might be using only a fraction of their 
reserved capacity, and we might thus waste some resources. 
In this scenario, I would probably prefer R3 to get started opportunistically 
(and if R1,R2 demand does not spike till t_i where R3 capacity grows to >0 we 
are golden). 

We could clearly construct other scenario in which letting the AM to start will 
only mean we need to preempted as R1,R2 spike. 
This is a balancing act of "work preservation" vs "guaranteed execution".  I am 
ok to resolve it in either direction, what's your vote? (Anyone else with 
opinions on this?)



> ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is 
> not consistent
> --------------------------------------------------------------------------------------------
>
>                 Key: YARN-4003
>                 URL: https://issues.apache.org/jira/browse/YARN-4003
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Carlo Curino
>         Attachments: YARN-4003.patch
>
>
> The inherited behavior from LeafQueue (limit AM % based on capacity) is not a 
> good fit for ReservationQueue (that have highly dynamic capacity). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4003) ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is not consistent

Reply via email to