[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163564#comment-15163564
 ] 

Arun Suresh commented on YARN-1040:
-----------------------------------

Spent some time going thru the conversation (this one as well as YARN-1404)
Given that this has been tracked as a requirement for In place application 
upgrades and it has been sometime since any activity has been posted here, 
[~bikassaha] / [~vinodkv] / [~hitesh] / [~tucu00] / [~steve_l], can you kindly 
clarify the following ?
# Are we still trying to handle the case where we have > 1 processes running 
against a container *at the same time*
# Have we decided that allowing a Container with 0 processes running is a bad 
idea ?

>From the context of getting Application upgrades working, I guess 1) can be 
>relaxed to exactly 1 process running under a container but AM has the option 
>of explicitly starting via the {{startProcess(containerLaunchContext)}} API 
>Bikas mentioned (an additional constraint could probably be the startProcess 
>has to be called within a timeout if no ContainerLaunchContext has been 
>provided with the initial {{startContainer()}} else NM will deem the container 
>dead).

In addition, I was also thinking
# If a process is already running in the container when a 
{{startProcess(ContainerLaunchContext)}} is received, then the first process is 
killed and another is started using the new {{ContainerLaunchContext}}
# Maybe we can refine the above by add an 
{{upgradeProcess(ContainerLaunchContext)}} API that can additionally take on a 
policy like:
## auto-rollback if new process does not start within a timout.
## Rollback could either mean keeping the old process running until upgraded 
process is up -or- if we want to preserve semantics of only 1 process per 
container, first kill the old process and try to start new one, and on failure 
restart old version.

If everyone is ok with the above, I volunteer to either post a preliminary 
patch for the above or if the details get dicier during investigation, I can 
put up a doc.

Thoughts ?  


> De-link container life cycle from the process and add ability to execute 
> multiple processes in the same long-lived container
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1040
>                 URL: https://issues.apache.org/jira/browse/YARN-1040
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 3.0.0
>            Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to