[
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163564#comment-15163564
]
Arun Suresh commented on YARN-1040:
-----------------------------------
Spent some time going thru the conversation (this one as well as YARN-1404)
Given that this has been tracked as a requirement for In place application
upgrades and it has been sometime since any activity has been posted here,
[~bikassaha] / [~vinodkv] / [~hitesh] / [~tucu00] / [~steve_l], can you kindly
clarify the following ?
# Are we still trying to handle the case where we have > 1 processes running
against a container *at the same time*
# Have we decided that allowing a Container with 0 processes running is a bad
idea ?
>From the context of getting Application upgrades working, I guess 1) can be
>relaxed to exactly 1 process running under a container but AM has the option
>of explicitly starting via the {{startProcess(containerLaunchContext)}} API
>Bikas mentioned (an additional constraint could probably be the startProcess
>has to be called within a timeout if no ContainerLaunchContext has been
>provided with the initial {{startContainer()}} else NM will deem the container
>dead).
In addition, I was also thinking
# If a process is already running in the container when a
{{startProcess(ContainerLaunchContext)}} is received, then the first process is
killed and another is started using the new {{ContainerLaunchContext}}
# Maybe we can refine the above by add an
{{upgradeProcess(ContainerLaunchContext)}} API that can additionally take on a
policy like:
## auto-rollback if new process does not start within a timout.
## Rollback could either mean keeping the old process running until upgraded
process is up -or- if we want to preserve semantics of only 1 process per
container, first kill the old process and try to start new one, and on failure
restart old version.
If everyone is ok with the above, I volunteer to either post a preliminary
patch for the above or if the details get dicier during investigation, I can
put up a doc.
Thoughts ?
> De-link container life cycle from the process and add ability to execute
> multiple processes in the same long-lived container
> ----------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Affects Versions: 3.0.0
> Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly,
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that
> something could be run in the container while a long-lived process was
> already running. This can be useful in monitoring and reconfiguring the
> long-lived process, as well as shutting it down.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)