Carlo Curino commented on YARN-4003:

[~sunilg], I think what you propose make is possible, but the semantics would 
be a bit unpleasant. Assume a reservation queue R1 launches tons of AMs, and 
now another reservation queue R2 is stuck not being able to run any job. I 
wouldn't want that... I would rather have a reservation burning its entire 
capacity in AMs, but allow other reservation queues to launch their jobs. 

I think the cleaner solution (but definitely longer term) would be to treat the 
RM scheduling bandwidth as a separate (reservable) resource. So a queue (and 
similarly a reservation) can be configure to allow up to a certain amount of 
AMs (which in turn bounds how much RM scheduling bandwidth I am devoting to 
this queue). This would also makes lots of sense for the federation effort: 
YARN-2915 (where we need to partition jobs across sub-clusters to protect the 
RMs from excessive AM-RM traffic due to the scale-out nature of federation). 

What are folks generally thinking about explicitly capturing the cost of 
scheduler bandwidth (e.g., a service that launches 10 tasks and never asks for 
anything again is much less work for the RM than a MR jobs running many many 
short-lived tasks) ?

> ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is 
> not consistent
> --------------------------------------------------------------------------------------------
>                 Key: YARN-4003
>                 URL: https://issues.apache.org/jira/browse/YARN-4003
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Carlo Curino
>            Assignee: Carlo Curino
>         Attachments: YARN-4003.patch
> The inherited behavior from LeafQueue (limit AM % based on capacity) is not a 
> good fit for ReservationQueue (that have highly dynamic capacity). 

This message was sent by Atlassian JIRA

Reply via email to