[
https://issues.apache.org/jira/browse/YARN-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847108#comment-13847108
]
Sandy Ryza commented on YARN-1495:
----------------------------------
Thanks for taking a look Vinod.
bq. Any specific use-case? Example where it can be used? To justify this isn't
feature creep.
Yeah, we've seen requests for this a few times. I think the most common
scenario is that someone experiences job slowly because of the queue that it's
in and the job needs to be placed in a queue where it can complete more
quickly. This can occur because it's taking longer than expected and a
deadline is approaching, the original queue is fuller than expected, the job
was submitted incorrectly in the first place but has made some progress, or for
a number of other reasons.
bq. What happens when scheduling-constraints are violated? The client will just
get an error? It kind of depends on the type of scheduling constraint.
Not sure how this should play out for the Capacity Scheduler, but for the Fair
Scheduler constraints I mentioned in the description I think the client should
get an error. I suppose another option would be to kill containers until the
constraints would be satisfied, but I think this is a lot more work and not
clearly better behavior.
bq. Who initiates the move any regular user or just admins?
My opinion is any regular user, within ACLs. I.e. if I could kill my job and
resubmit it to a different queue, I should be able to move it.
bq. Only running apps can be moved?
I don't see a reason that we shouldn't be able to move an app that has been
submitted, but not accepted, or that is very close to completion. In some
cases we may not need to touch the scheduler. There are definitely race
conditions we need to be careful of here.
bq. Apps may be in the process of submitting new requests. What happens to
them? I guess queue-move and new-requests should be synchronized.
Right.
> Allow moving apps between queues
> --------------------------------
>
> Key: YARN-1495
> URL: https://issues.apache.org/jira/browse/YARN-1495
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: scheduler
> Affects Versions: 2.2.0
> Reporter: Sandy Ryza
> Assignee: Sandy Ryza
>
> This is an umbrella JIRA for work needed to allow moving YARN applications
> from one queue to another. The work will consist of additions in the command
> line options, additions in the client RM protocol, and changes in the
> schedulers to support this.
> I have a picture of how this should function in the Fair Scheduler, but I'm
> not familiar enough with the Capacity Scheduler for the same there.
> Ultimately, the decision to whether an application can be moved should go
> down to the scheduler - some schedulers may wish not to support this at all.
> However, schedulers that do support it should share some common semantics
> around ACLs and what happens to running containers.
> Here is how I see the general semantics working out:
> * A move request is issued by the client. After it gets past ACLs, the
> scheduler checks whether executing the move will violate any constraints. For
> the Fair Scheduler, these would be queue maxRunningApps and queue
> maxResources constraints
> * All running containers are transferred from the old queue to the new queue
> * All outstanding requests are transferred from the old queue to the new queue
> Here is I see the ACLs of this working out:
> * To move an app from a queue a user must have modify access on the app or
> administer access on the queue
> * To move an app to a queue a user must have submit access on the queue or
> administer access on the queue
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)