[
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15208594#comment-15208594
]
Varun Vasudev commented on YARN-1040:
-------------------------------------
Thanks for putting up the proposal [~asuresh]!
bq. "ContainerId" becomes "AllocationId"
Is AllocationId a new class that we will introduce or a rename of the existing
ContainerId class? In either case we have some issues to sort out - the first
one won't be backward compatible and in the second case, will the NM generate
container ids for the individual containers?
bq. An AM can receive only a single allocation on a Node, The Scheduler will
"bundle" all Allocations on a Node for an app into a single Large Allocation.
Can you explain why we need this restriction?
bq. Each Container is tagged with a "ContainerId" which is known only to the AM.
Are you referring to the current ContainerId class? If yes, why is it known
only to the AM?
I actually agree with both Vinod and Bikas. The current approach is a little
disruptive and not very useful for existing apps. I think we should separate
out allocations work into their own classes on the RM and the NM with new APIs
added for the RM and the NM. I don't think we can get away with modifying the
existing APIs, the one exception being the allocate call, where we can add an
additional flag to indicate whether an allocation or a container is desired.
Internally, we can change the implementation to have the container model use
allocations but I think allocations will have to have their own state machine
withe slightly different semantics than containers(on both the RM and NM).
> De-link container life cycle from an Allocation
> -----------------------------------------------
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Affects Versions: 3.0.0
> Reporter: Steve Loughran
> Attachments: YARN-1040-rough-design.pdf
>
>
> The AM should be able to exec >1 process in a container, rather than have the
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly,
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that
> something could be run in the container while a long-lived process was
> already running. This can be useful in monitoring and reconfiguring the
> long-lived process, as well as shutting it down.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)