[ https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163708#comment-15163708 ]
Bikas Saha commented on YARN-1040: ---------------------------------- I am not sure we need to place (somewhat artificial) constraints on the app when its not clear that it practically affects YARN 1) Container with no process should be allowed. Apps could terminate all running tasks of version A, then start running tasks of version B when they are not backwards compatible. 2) Container should be allowed to run multiple processes. This is similar to the existing process spawning more processes. It is different from that in the sense that the NM has to add the new process to existing monitoring/cgroups etc. 3) Startprocess should be allowed with no process actually started. This will allow apps to localize new resources to an existing container. Alternatively, we could create a new localization API thats delinked from starting the process. But re-localization is an important related feature that we should look at supporting via this work because currently that does not work since its tied to start process. 4) Most current apps are already communicating directly with their tasks and hence can shut them down when they are not needed. However, like suggested above, it may be useful for the NM to provide a feature whereby the previous task can be shutdown when a new task request is received. Alternatively, the NM could provide a stopProcess API to make that explicit. IMO all of this should be allowed. The timeline could be different with some being allowed earlier and some later based on implementation effort. Thinking ahead, it may be useful for the NM to accept a series of API calls within the same RPC (with the current mechanism supported as a single command entity for backwards compatibility). Then we will not have to build a lot of logic into the NM. The app can get all features by composing a multi-command entity. E.g. Current start process = {acquire, localize, start} // where acquire means start container Current shutdown process = {stop, release} // where release means give up container Only localize = {localize} Start another process = {localize, start} Start another process after shutting down first process = {stop, start} or {stop, localize, start} Start another process and then shutdown the first process = {start, stop} New container shutdown = {release} // at this point there may be 0 or more processes running and which will be stopped > De-link container life cycle from the process and add ability to execute > multiple processes in the same long-lived container > ---------------------------------------------------------------------------------------------------------------------------- > > Key: YARN-1040 > URL: https://issues.apache.org/jira/browse/YARN-1040 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Affects Versions: 3.0.0 > Reporter: Steve Loughran > > The AM should be able to exec >1 process in a container, rather than have the > NM automatically release the container when the single process exits. > This would let an AM restart a process on the same container repeatedly, > which for HBase would offer locality on a restarted region server. > We may also want the ability to exec multiple processes in parallel, so that > something could be run in the container while a long-lived process was > already running. This can be useful in monitoring and reconfiguring the > long-lived process, as well as shutting it down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)