[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15165668#comment-15165668
 ] 

Bikas Saha commented on YARN-1040:
----------------------------------

Agree with your scenarios. 

I am trying to figure a way by which this does not become a YARN problem (both 
initial work and ongoing maintenance). E.g. we dont know for sure that the 
resource needs to be x, 2x or 3x. This is an allocation decision and cannot be 
done without the RMs blessing. And increasing container resources is already 
work in progress and may become another NM primitive. Next, what is the 
ordering for the tasks during an upgrade? We could implement one of many 
possibilities but then be stuck with bug-fixing or improving it. Potentially 
use that as a precedent to implement yet another upgrade policy. 

Hence, my suggestion of creating composable primitives that can be used to 
easily implement these flows. And leave it to the apps to determine the exact 
upgrades paths. Perhaps Slider is a better place which could wrap different 
upgrade possibilities using the composable primitives. E.g. 
SliderStopAllUpgradePolicy or SliderConcurrentUpgradePolicy. Or they could be 
provided as helper libs in YARN/NMClient so apps dont have to compose the 
primitives from scratch. The main aim is to continue to make core YARN/NM 
simple by creating primitives and layering complexity on top. This approach may 
be simpler and incremental to develop, test and deploy. Of course, these are my 
personal design views :)

Thoughts?


> De-link container life cycle from the process and add ability to execute 
> multiple processes in the same long-lived container
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1040
>                 URL: https://issues.apache.org/jira/browse/YARN-1040
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 3.0.0
>            Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to