[
https://issues.apache.org/jira/browse/YARN-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067269#comment-15067269
]
Carlo Curino commented on YARN-4003:
------------------------------------
[~sunilg], I think what you propose make is possible, but the semantics would
be a bit unpleasant. Assume a reservation queue R1 launches tons of AMs, and
now another reservation queue R2 is stuck not being able to run any job. I
wouldn't want that... I would rather have a reservation burning its entire
capacity in AMs, but allow other reservation queues to launch their jobs.
I think the cleaner solution (but definitely longer term) would be to treat the
RM scheduling bandwidth as a separate (reservable) resource. So a queue (and
similarly a reservation) can be configure to allow up to a certain amount of
AMs (which in turn bounds how much RM scheduling bandwidth I am devoting to
this queue). This would also makes lots of sense for the federation effort:
YARN-2915 (where we need to partition jobs across sub-clusters to protect the
RMs from excessive AM-RM traffic due to the scale-out nature of federation).
What are folks generally thinking about explicitly capturing the cost of
scheduler bandwidth (e.g., a service that launches 10 tasks and never asks for
anything again is much less work for the RM than a MR jobs running many many
short-lived tasks) ?
> ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is
> not consistent
> --------------------------------------------------------------------------------------------
>
> Key: YARN-4003
> URL: https://issues.apache.org/jira/browse/YARN-4003
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Reporter: Carlo Curino
> Assignee: Carlo Curino
> Attachments: YARN-4003.patch
>
>
> The inherited behavior from LeafQueue (limit AM % based on capacity) is not a
> good fit for ReservationQueue (that have highly dynamic capacity).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)