[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163864#comment-15163864
 ] 

Arun Suresh commented on YARN-1040:
-----------------------------------

Thanks for the feedback [~bikassaha]

I understand we might not want to place artificial constraint of apps, I was 
just trying to scope out the bare min effort required specifically for long 
running container upgrades. That said, im all for going the whole hog (allow 0 
or 1+ processes) if that is maybe easier.

Some thoughts specifically with regard to container upgrade:
# If we allow multiple processes per container, we might need to have 
{{startProcess()}} to return maybe a *processId* which can subsequently be used 
by the AM to address the process in subsequent calls like {{stopProcess()}}. 
This might complicate the state of AM, and maybe we can leave it out in the 
first cut.
# w.r.t resource re-localization, as per YARN-4597, we are exploring 
localization as a service and possibly re-localization on the fly.
# I like the idea of clubbing multiple API calls in the same RPC. But should 
*upgrade* be a first class semantic, or should it be expressed as a {{localize 
v2, start v2, stop v1}} API combo. One reason to distinguish may be in the case 
of having both versions up at the same time till the new version stabilizes... 
in an upgrade case, the Container should probably be allowed to go 2x its 
allocated resource limit for a period of time, but in the case were we are just 
starting 2 processes, this should probably not be allowed.


> De-link container life cycle from the process and add ability to execute 
> multiple processes in the same long-lived container
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1040
>                 URL: https://issues.apache.org/jira/browse/YARN-1040
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 3.0.0
>            Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to