Craig Welch commented on YARN-1039:

That's along the lines of what I was thinking after talking with [~xgong] and 
looking around a bit more

It sounds like we need to be able to ask the resource manager for a container 
for long-lived cases (not on a spot instance, for example), both when launching 
the AM container (in the ApplicationSubmissionContext) and when the 
ApplicationMaster wants to get a container later on (ResourceRequestProto).  
This is really a scheduling hint for the resource manager (in both cases)

We need to be able to mark an application as long running for other reasons 
(adjusting progress bar behavior, etc)  

We need to be able to tell the node manager that a container will be long 
running when it is launched (to adjust logging behavior, etc).  An application 
master may launch instances not like itself (some not long running when it is 
long running) - which it can, as the application master can specify whatever it 
wants to in the resourcerequestproto

I do think it would be good to keep the interface as consistent as possible, 
and should probably have at least a rough idea of the whole picture before 
making additions.

I suggest this:
An enum of scheduling constraints, initially only to include LONG_RUNNING, 
later would include affinity, etc, this is solely for node selection by the 
resource manager
A repeated field of this enum on the ResourceRequestProto and the 
ApplicationSubmissionContext, in both cases this is purely a constraint on 
where the container is placed (the application master in the latter case)  
Go ahead and use a tag [~zjshen] on the application submission to indicate that 
an application is long-running for purposes of display (things like the 
progress bar, etc) (that seems to be an appropriate use for application tags)
a boolean value on the ContainerLaunchContextProto to indicate it is 

There are some tradeoffs in this approach but I think it's good overall - 
All the variations we have identified are covered 
It is consistent in how it handles launching a long-running container for both 
the application master and other containers
It is also consistent with the approach to date wrt the application submission 
context and the resource request (where items needed for launching the 
application master container are added to the application submission context)
When other scheduler constraints relevant for an application master are 
introduced later the api will not need to change to accommodate them (other 
than adding them to the enum)
We reuse the application tag for display and other like purposes, and in 
general are adding the minimum necessary to cover the identified cases 

(I thought it was simplest to just use a boolean on the container launch 
context, in that case the behavior is one way or the other, and other 
scheduling constraints don't apply).


> Add parameter for YARN resource requests to indicate "long lived"
> -----------------------------------------------------------------
>                 Key: YARN-1039
>                 URL: https://issues.apache.org/jira/browse/YARN-1039
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 3.0.0, 2.1.1-beta
>            Reporter: Steve Loughran
>            Assignee: Craig Welch
>            Priority: Minor
>         Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch
> A container request could support a new parameter "long-lived". This could be 
> used by a scheduler that would know not to host the service on a transient 
> (cloud: spot priced) node.
> Schedulers could also decide whether or not to allocate multiple long-lived 
> containers on the same node

This message was sent by Atlassian JIRA

Reply via email to