[
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211613#comment-15211613
]
Arun Suresh commented on YARN-1040:
-----------------------------------
Firstly, Thank you [~vinodkv], [~vvasudev] and [~bikassaha] for reviewing the
doc and chiming in with your thoughts..
[~bikassaha]
bq. can see the argument of asking users to use new APIs for new features but
requiring existing apps to change their AM/RM implementations….
We might not actually need to do this. If we ensure that the existing external
facing methods on the ContainerManagementProtocol and ApplicationMasterProtocol
work as expected, and introduce the new methods in wrapper protocols. Apps that
need new functionality can use the new API and those that don’t can stick with
the old ones (until a major release when we can retire the old protocols). We
have tried something along the same lines in YARN-2885 (not committed to trunk
yet) where we have a DistributedSchedulingProtocol that extends the
ApplicationMasterProtocol and still exposes the old API.
bq. ..just to be able to launch multiple processes does not seem empathetic.
Hmmm.. Given that launching multiple processes, being a new feature, I feel
that it should be fine to mandate the app to use new APIs, no ?
----
[~vvasudev]
bq. Is AllocationId a new class that we will introduce or a rename of the
existing ContainerId class?
I expect it to be a new class, but my thinking was that it should replace the
existing ContainerId in the RM. To preserve backward compatibility, for apps
using the older API, we can somehow transform the AllocationId into a
ContainerId when the RM responds to the app.
bq. will the NM generate container ids for the individual containers?
That was my plan. As mentioned above, for older apps, ContainerId =
AllocationId + '\-0' and for apps requesting multiple containers per
allocation, ContainerId = AllocationId + '\-' + index (some id incrementing
from 0)
bq. An AM can receive only a single allocation on a Node, The Scheduler will
"bundle" all Allocations on a Node for an app into a single Large Allocation.
My thinking was that , given this feature will allow an app to start multiple
containers using a single allocation, an app can now reuse the same allocation
to start a new container, rather than obtain a new allocation. This will
minimize the number of Allocations the RM would need to give out.
Thinking further, I understand how this might break backward compatibility (for
apps using the older API and expecting multiple ContainerTokens on the same
node), so I guess, we can remove this restriction and make sure the "bundling"
happens only for app using the new API.
bq. Are you referring to the current ContainerId class? If yes, why is it known
only to the AM?
This also concerns the points [~vinodkv] brought up about container exit
notifications.
*Today* the ContainerId is known to the RM, since:
* The RM generates the ContainerId, so it obviously needs to know about it.
* The primary means of the RM reclaiming resources from a Node, to schedule
waiting apps, is when the it receives a Container Complete / Killed
notification from the Node heartbeat, for which the ContainerId is necessary
for matching the container resource.
* This is also the primary means of the AM being notified of a completed /
killed container, viz. via the RM allocateResponse.
In the new scheme of things
* An Allocation technically never "Completes", unless the AM explicitly
deactivates it, at which point the Node can notify the RM of the terminated
Allocation.
* For backward compatibility, Single-use allocations will automatically be
deactivated and notified to the RM when the associated container completes.
* An AM on restart / failover will be notified by the RM of existing
Allocations and can query the NM directly for the status of individual
containers.
* An NM on restart neednot report the status of every container, just the
Allocations that were active on the NM. The respective AMs can then query the
NM and obtain status of the Container.
For the above cases, the RM does not need to know about the ContainerId per se,
only the AllocationId. The only other case I could think of for the RM knowing
about the individual container is for the case of smarter pre-emption, where
the RM can pick specific containers within an Allocation to be killed rather
than the Allocation itself (I had mentioned this in the doc too I guess). But I
guess that can be addressed in subsequent iterations.
----
[~vinodkv]
You brought up some good points, will incorporate them into the doc.
If you guys are fine with it, I plan to open separate JIRAs, under YARN-4726
breaking up this work. I feel we can have more focused discussion there on
specific aspects of the design.
> De-link container life cycle from an Allocation
> -----------------------------------------------
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Affects Versions: 3.0.0
> Reporter: Steve Loughran
> Attachments: YARN-1040-rough-design.pdf
>
>
> The AM should be able to exec >1 process in a container, rather than have the
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly,
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that
> something could be run in the container while a long-lived process was
> already running. This can be useful in monitoring and reconfiguring the
> long-lived process, as well as shutting it down.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)