[jira] [Commented] (YARN-1040) De-link container life cycle from an Allocation

Arun Suresh (JIRA) Fri, 25 Mar 2016 01:57:07 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211613#comment-15211613
 ]


Arun Suresh commented on YARN-1040:
-----------------------------------

Firstly, Thank you [~vinodkv], [~vvasudev] and [~bikassaha] for reviewing the 
doc and chiming in with your thoughts.. 

[~bikassaha]

bq. can see the argument of asking users to use new APIs for new features but 
requiring existing apps to change their AM/RM implementations….
We might not actually need to do this. If we ensure that the existing external 
facing methods on the ContainerManagementProtocol and ApplicationMasterProtocol 
work as expected, and introduce the new methods in wrapper protocols. Apps that 
need new functionality can use the new API and those that don’t can stick with 
the old ones (until a major release when we can retire the old protocols). We 
have tried something along the same lines in YARN-2885 (not committed to trunk 
yet) where we have a DistributedSchedulingProtocol that extends the 
ApplicationMasterProtocol and still exposes the old API.

bq. ..just to be able to launch multiple processes does not seem empathetic.
Hmmm.. Given that launching multiple processes, being a new feature, I feel 
that it should be fine to mandate the app to use new APIs, no ?
        
----

[~vvasudev]

bq. Is AllocationId a new class that we will introduce or a rename of the 
existing ContainerId class?
I expect it to be a new class, but my thinking was that it should replace the 
existing ContainerId in the RM. To preserve backward compatibility, for apps 
using the older API, we can somehow transform the AllocationId into a 
ContainerId when the RM responds to the app.

bq. will the NM generate container ids for the individual containers?
That was my plan. As mentioned above, for older apps, ContainerId = 
AllocationId + '\-0' and for apps requesting multiple containers per 
allocation,   ContainerId = AllocationId + '\-' + index (some  id incrementing 
from 0)

bq. An AM can receive only a single allocation on a Node, The Scheduler will 
"bundle" all Allocations on a Node for an app into a single Large Allocation.
My thinking was that , given this feature will allow an app to start multiple 
containers using a single allocation, an app can now reuse the same allocation 
to start a new container, rather than obtain a new allocation. This will 
minimize the number of Allocations the RM would need to give out.
Thinking further, I understand how this might break backward compatibility (for 
apps using the older API and expecting multiple ContainerTokens on the same 
node), so I guess, we can remove this restriction and make sure the "bundling" 
happens only for app using the new API.

bq. Are you referring to the current ContainerId class? If yes, why is it known 
only to the AM?
This also concerns the points [~vinodkv] brought up about container exit 
notifications.
*Today* the ContainerId is known to the RM, since:
* The RM generates the ContainerId, so it obviously needs to know about it.
* The primary means of the RM reclaiming resources from a Node, to schedule 
waiting apps, is when the it receives a Container Complete / Killed 
notification from the Node heartbeat, for which the ContainerId is necessary 
for matching the container resource.
* This is also the primary means of the AM being notified of a completed / 
killed container, viz. via the RM allocateResponse.

In the new scheme of things
* An Allocation technically never "Completes", unless the AM explicitly 
deactivates it, at which point the Node can notify the RM of the terminated 
Allocation.
* For backward compatibility, Single-use allocations will automatically be 
deactivated and notified to the RM when the associated container completes.
* An AM on restart / failover will be notified by the RM of existing 
Allocations and can query the NM directly for the status of individual 
containers.
* An NM on restart neednot report the status of every container, just the 
Allocations that were active on the NM. The respective AMs can then query the 
NM and obtain status of the Container.

For the above cases, the RM does not need to know about the ContainerId per se, 
only the AllocationId. The only other case I could think of for the RM knowing 
about the individual container is for the case of smarter pre-emption, where 
the RM can pick specific containers within an Allocation to be killed rather 
than the Allocation itself (I had mentioned this in the doc too I guess). But I 
guess that can be addressed in subsequent iterations.

----

[~vinodkv]

You brought up some good points, will incorporate them into the doc.

If you guys are fine with it, I plan to open separate JIRAs, under YARN-4726 
breaking up this work. I feel we can have more focused discussion there on 
specific aspects of the design.

> De-link container life cycle from an Allocation
> -----------------------------------------------
>
>                 Key: YARN-1040
>                 URL: https://issues.apache.org/jira/browse/YARN-1040
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 3.0.0
>            Reporter: Steve Loughran
>         Attachments: YARN-1040-rough-design.pdf
>
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1040) De-link container life cycle from an Allocation

Reply via email to