[
https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848006#comment-13848006
]
Sandy Ryza commented on YARN-1495:
----------------------------------
bq. We have to touch RMApp etc before hitting scheduler as state in RM is
partitioned inside and outside scheduler.
Sorry, I wasn't clear - definitely agree we need to go through RM app, just was
wondering whether to do it with events or synchronously. Thanks for the heads
up on the race condition - will watch out for that.
bq. The paradigm followed is a multi-phase request
An issue with doing a multi-phase request is that, if the move fails, we would
like to return an appropriate error message with the reason to the client, and
the reason can go as far down as the scheduler. We could give the client a
request ID that they could come back with to find the result, but that kind of
seems like overkill to me. While async/multi-phase requests 100% make sense to
me in situations like the AMRM protocol where requests come in all the time,
moves will normally be human-initiated requests that come with very low
frequency. I'll write the code with events, which will allow us to take either
the blocking (with a Future) or non-blocking approach.
> Allow moving apps between queues
> --------------------------------
>
> Key: YARN-1495
> URL: https://issues.apache.org/jira/browse/YARN-1495
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: scheduler
> Affects Versions: 2.2.0
> Reporter: Sandy Ryza
> Assignee: Sandy Ryza
>
> This is an umbrella JIRA for work needed to allow moving YARN applications
> from one queue to another. The work will consist of additions in the command
> line options, additions in the client RM protocol, and changes in the
> schedulers to support this.
> I have a picture of how this should function in the Fair Scheduler, but I'm
> not familiar enough with the Capacity Scheduler for the same there.
> Ultimately, the decision to whether an application can be moved should go
> down to the scheduler - some schedulers may wish not to support this at all.
> However, schedulers that do support it should share some common semantics
> around ACLs and what happens to running containers.
> Here is how I see the general semantics working out:
> * A move request is issued by the client. After it gets past ACLs, the
> scheduler checks whether executing the move will violate any constraints. For
> the Fair Scheduler, these would be queue maxRunningApps and queue
> maxResources constraints
> * All running containers are transferred from the old queue to the new queue
> * All outstanding requests are transferred from the old queue to the new queue
> Here is I see the ACLs of this working out:
> * To move an app from a queue a user must have modify access on the app or
> administer access on the queue
> * To move an app to a queue a user must have submit access on the queue or
> administer access on the queue
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)