[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163708#comment-15163708
 ] 

Bikas Saha commented on YARN-1040:
----------------------------------

I am not sure we need to place (somewhat artificial) constraints on the app 
when its not clear that it practically affects YARN

1) Container with no process should be allowed. Apps could terminate all 
running tasks of version A, then start running tasks of version B when they are 
not backwards compatible.
2) Container should be allowed to run multiple processes. This is similar to 
the existing process spawning more processes. It is different from that in the 
sense that the NM has to add the new process to existing monitoring/cgroups etc.
3) Startprocess should be allowed with no process actually started. This will 
allow apps to localize new resources to an existing container. Alternatively, 
we could create a new localization API thats delinked from starting the 
process. But re-localization is an important related feature that we should 
look at supporting via this work because currently that does not work since its 
tied to start process.
4) Most current apps are already communicating directly with their tasks and 
hence can shut them down when they are not needed. However, like suggested 
above, it may be useful for the NM to provide a feature whereby the previous 
task can be shutdown when a new task request is received. Alternatively, the NM 
could provide a stopProcess API to make that explicit.

IMO all of this should be allowed. The timeline could be different with some 
being allowed earlier and some later based on implementation effort.

Thinking ahead, it may be useful for the NM to accept a series of API calls 
within the same RPC (with the current mechanism supported as a single command 
entity for backwards compatibility). Then we will not have to build a lot of 
logic into the NM. The app can get all features by composing a multi-command 
entity.
E.g.
Current start process = {acquire, localize, start} // where acquire means start 
container
Current shutdown process = {stop, release} // where release means give up 
container
Only localize = {localize}
Start another process = {localize, start}
Start another process after shutting down first process = {stop, start} or 
{stop, localize, start}
Start another process and then shutdown the first process = {start, stop}
New container shutdown = {release} // at this point there may be 0 or more 
processes running and which will be stopped


> De-link container life cycle from the process and add ability to execute 
> multiple processes in the same long-lived container
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1040
>                 URL: https://issues.apache.org/jira/browse/YARN-1040
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 3.0.0
>            Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to