[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166633#comment-15166633
 ] 

Bikas Saha commented on YARN-1040:
----------------------------------

I am sorry if I caused a digression by mentioning Slider etc.

I am not sure the upgrade scenario is the only one for this jira since this 
jira covers a broader set. Even without upgrades apps can change the processes 
they are running in a container without having to lose the container 
allocation. Identical calls of primitives could be used without the notion of 
upgrade. E.g. start a Java process first for a Java task, then launch a python 
process for a Python task. To the NM this is identical to starting v1 and then 
starting v2. So while it makes sense for the second one to use an API called 
upgrade, it may not for the first one. 

(Unrelated to this jira, IMO, YARN should allow upgrade of app code without 
losing containers but not necessarily understand it deeply. E.g. YARN need not 
assume that upgrade will need additional resource or try to acquire them 
transparently for the application.)

For the purpose of this jira here is what my thoughts are when I had opened 
YARN-1292 to delink process lifecycle from container.
1) new API - acquireContainer - means ask for the allocated resource. The API 
has a flag to specify whether process exit implies releaseContainer. This is 
for backwards compatibility with a default of true. Apps that want to continue 
to use that behavior can explicitly pass true when using the new API and is 
mainly for reducing number of RPCs for apps like MR/Tez etc.
2) new API - startProcess - means start the remote process
3) new API - stopProcess - means stop the remote process
4) new API - releaseContainer - means release the allocated resource
5) Potentially a new API for localization, though in theory, this could be 
separate.

Since this fine grained control makes the protocol chatty, we can reduce the 
RPC traffic by having a new NM RPC, say NMCommand, that takes a sequence of API 
primitives that can be sent in 1 RPC.
So the current API of startContainer effectively becomes NMCommand(1, 2) and 
stopContainer becomes NMCommand(3,4). This can be leveraged for backwards 
compatibility and rolling upgrades.

The above items would effectively delink process and container lifecyle and 
close out this jira.

This provides the fine grained control in core YARN that can be used for 
various scenarios e.g. upgrades without YARN understanding the scenarios. If we 
need to add higher level notions for upgrades etc. then those could be done as 
separate items.

I hope that helps make my thoughts concrete within the scope of this jira.


> De-link container life cycle from the process and add ability to execute 
> multiple processes in the same long-lived container
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1040
>                 URL: https://issues.apache.org/jira/browse/YARN-1040
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 3.0.0
>            Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to