[
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203496#comment-15203496
]
Vinod Kumar Vavilapalli commented on YARN-1040:
-----------------------------------------------
Thanks for the document, [~asuresh]!
[~bikassaha]
- Our comments crossed on Feb 25, so didn't see yours.
- Looking at the doc, I can see why it gives the impression of a redesign, but
it is less of a redesign, and more of adding new functionality that needs new
semantics.
- Clearly the new naming makes it look like a lot of new changes for the apps,
but that is the reality (for apps that want to use this new feature)!
- We do make most of our decisions on JIRA. We can continue the discussion
here. If need be, sure, we can send out a note on the dev lists.
So, with that out of the way, let's step back and look at the semantics first
and foremost and keep out the discussions about renames and the expected level
of changes for later.
h4. APIs
There are big differences between the two proposals w.r.t the APIs. Even though
it looks like your proposal earlier assumes that this can be made a localized
change in the NM side APIs, there are newer semantics that mandate new (and/or
modified) APIs on both AM-NM and RM-AM interactions. A couple of them that
come to my mind
- *Allocation/container release*: We need two separate mechanisms from AM to
RM for (a) releasing allocations whole-sale (and thereby kill all running
containers inside) and (b) kill one or more containers running inside an
allocation *directly* at the RM - this is an existing feature - because the app
either doesn't want to open N connections to N nodes in the cluster, or simply
because the NM is not accessible anymore/in-the-interim.
- *Allocation/container exit notifications*: The AMs will further be
interested in two separate back-notifications from the RM (a) is the allocation
itself released completely by the platform - say due to preemption? (b) or has
one of the containers running inside the allocation exited and so I have to act
on it? Remember that this is simply a disambiguation of our existing
container-exit notification mechanism.
h4. Internals
Internally inside the RM too, the state-machine of the allocation itself is
different from the containers' life-cycle. For e.g., the containers' life-cycle
determines the completion notifications that we send across to the AMs and only
the allocation life-cycle impacts scheduling.
h4. Compatibility for existing apps
What is proposed in the doc as well as the way I originally described it, it is
definitely backwards compatible. Existing applications do not need a single
line of change. Only newer versions of applications that desire to use the new
feature have to use newer APIs - something that is not different from any other
core YARN feature at all.
h4. Changes for apps that want to use the new feature
Even in your proposal, an app/framework that desires to use the new feature has
to make non-significant changes in the AM to use this feature correctly
- generating containerIDs
- managing the list of containers running inside an allocation
- managing the outstanding unused portion of an allocation, and incrementally
launching more and more containers till the allocation is full
- Containers running under non-reusable allocations do not need an explicit
signal to the RM for clean up - apps can simply stop the container on the NM
and everything else gets automatically taken care of. Apps that start using new
feature on the other hand will *have* to now also explicitly release
allocations outside of the life-cycle of the containers.
- We can optionally add auxiliary flags to inform NMs to auto-reap the
allocation when the last-container dies - only for apps that are okay with this
-, but either ways the apps need changes to do this as they intend it.
- Apps will also have to react differently on container-exit notifications and
allocation-released/preempted notifications.
Given the points above, I don't think we can get away with just an NM side API
change.
Depending on how much we have to change the APIs, I am willing to go either way
on the degree of renames in the API surface area. Inside the code base though,
I think we are better off calling things what they are.
> De-link container life cycle from an Allocation
> -----------------------------------------------
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Affects Versions: 3.0.0
> Reporter: Steve Loughran
> Attachments: YARN-1040-rough-design.pdf
>
>
> The AM should be able to exec >1 process in a container, rather than have the
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly,
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that
> something could be run in the container while a long-lived process was
> already running. This can be useful in monitoring and reconfiguring the
> long-lived process, as well as shutting it down.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)