[
https://issues.apache.org/jira/browse/YARN-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069062#comment-15069062
]
Sunil G commented on YARN-4003:
-------------------------------
Hi [~curino]
bq.Assume a reservation queue R1 launches tons of AMs, and now another
reservation queue R2 is stuck not being able to run any job
Yes, I agree that with one sudden spike of usage from a queue can make other
queues to starve. IN this case, I agree with current solution, it can
definitely help to overcome starvation and adhere to a limit also (if not the
worst case).
Long term plan sounds interesting. {{RM scheduling bandwidth}} per queue is
definitely a good metric here. I assume this metric should be less that queue
capacity always (not max-capacity), is it so?
{{cost of scheduler bandwidth}} is a metric which was long pending. However, it
is more like a ranking too. Like, ranking all apps based on its demand
(resource request) rate, and compute that total cost to the queue level. RM can
make use of this to know which queue has less cost and which has more. So in
reservation case, AMs can be restricted more in a queue with high cost for
scheduler bandwidth. thoughts?
> ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is
> not consistent
> --------------------------------------------------------------------------------------------
>
> Key: YARN-4003
> URL: https://issues.apache.org/jira/browse/YARN-4003
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Reporter: Carlo Curino
> Assignee: Carlo Curino
> Attachments: YARN-4003.patch
>
>
> The inherited behavior from LeafQueue (limit AM % based on capacity) is not a
> good fit for ReservationQueue (that have highly dynamic capacity).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)