Sunil G commented on YARN-4003:

Hi [~curino]
bq.Assume a reservation queue R1 launches tons of AMs, and now another 
reservation queue R2 is stuck not being able to run any job
Yes, I agree that with one sudden spike of usage from a queue can make other 
queues to starve. IN this case, I agree with current solution, it can 
definitely help to overcome starvation and adhere to a limit also (if not the 
worst case).

Long term plan sounds interesting. {{RM scheduling bandwidth}} per queue is 
definitely a good metric here. I assume this metric should be less that queue 
capacity always (not max-capacity), is it so?

{{cost of scheduler bandwidth}} is a metric which was long pending. However, it 
is more like a ranking too. Like, ranking all apps based on its demand 
(resource request) rate, and compute that total cost to the queue level. RM can 
make use of this to know which queue has less cost and which has more. So in 
reservation case,  AMs can be restricted more in a queue with high cost for 
scheduler bandwidth. thoughts?

> ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is 
> not consistent
> --------------------------------------------------------------------------------------------
>                 Key: YARN-4003
>                 URL: https://issues.apache.org/jira/browse/YARN-4003
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Carlo Curino
>            Assignee: Carlo Curino
>         Attachments: YARN-4003.patch
> The inherited behavior from LeafQueue (limit AM % based on capacity) is not a 
> good fit for ReservationQueue (that have highly dynamic capacity). 

This message was sent by Atlassian JIRA

Reply via email to