[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203496#comment-15203496
 ] 

Vinod Kumar Vavilapalli commented on YARN-1040:
-----------------------------------------------

Thanks for the document, [~asuresh]!

[~bikassaha]
 - Our comments crossed on Feb 25, so didn't see yours.
 - Looking at the doc, I can see why it gives the impression of a redesign, but 
it is less of a redesign, and more of adding new functionality that needs new 
semantics.
 - Clearly the new naming makes it look like a lot of new changes for the apps, 
but that is the reality (for apps that want to use this new feature)!
 - We do make most of our decisions on JIRA. We can continue the discussion 
here. If need be, sure, we can send out a note on the dev lists.

So, with that out of the way, let's step back and look at the semantics first 
and foremost and keep out the discussions about renames and the expected level 
of changes for later.

h4. APIs
There are big differences between the two proposals w.r.t the APIs. Even though 
it looks like your proposal earlier assumes that this can be made a localized 
change in the NM side APIs, there are newer semantics that mandate new (and/or 
modified) APIs  on both AM-NM and RM-AM interactions. A couple of them that 
come to my mind
 - *Allocation/container release*: We need two separate mechanisms from AM to 
RM for (a) releasing allocations whole-sale (and thereby kill all running 
containers inside) and (b) kill one or more containers running inside an 
allocation *directly* at the RM - this is an existing feature - because the app 
either doesn't want to open N connections to N nodes in the cluster, or simply 
because the NM is not accessible anymore/in-the-interim.
 - *Allocation/container exit notifications*: The AMs will further be 
interested in two separate back-notifications from the RM (a) is the allocation 
itself released completely by the platform - say due to preemption? (b) or has 
one of the containers running inside the allocation exited and so I have to act 
on it? Remember that this is simply a disambiguation of our existing 
container-exit notification mechanism.

h4. Internals
Internally inside the RM too, the state-machine of the allocation itself is 
different from the containers' life-cycle. For e.g., the containers' life-cycle 
determines the completion notifications that we send across to the AMs and only 
the allocation life-cycle impacts scheduling.

h4. Compatibility for existing apps
What is proposed in the doc as well as the way I originally described it, it is 
definitely backwards compatible. Existing applications do not need a single 
line of change. Only newer versions of applications that desire to use the new 
feature have to use newer APIs - something that is not different from any other 
core YARN feature at all.

h4. Changes for apps that want to use the new feature
Even in your proposal, an app/framework that desires to use the new feature has 
to make non-significant changes in the AM to use this feature correctly
 - generating containerIDs
 - managing the list of containers running inside an allocation
 - managing the outstanding unused portion of an allocation, and incrementally 
launching more and more containers till the allocation is full
 - Containers running under non-reusable allocations do not need an explicit 
signal to the RM for clean up - apps can simply stop the container on the NM 
and everything else gets automatically taken care of. Apps that start using new 
feature on the other hand will *have* to now also explicitly release 
allocations outside of the life-cycle of the containers.
 - We can optionally add auxiliary flags to inform NMs to auto-reap the 
allocation when the last-container dies - only for apps that are okay with this 
-, but either ways the apps need changes to do this as they intend it.
 - Apps will also have to react differently on container-exit notifications and 
allocation-released/preempted notifications.

Given the points above, I don't think we can get away with just an NM side API 
change.

Depending on how much we have to change the APIs, I am willing to go either way 
on the degree of renames in the API surface area. Inside the code base though, 
I think we are better off calling things what they are.

> De-link container life cycle from an Allocation
> -----------------------------------------------
>
>                 Key: YARN-1040
>                 URL: https://issues.apache.org/jira/browse/YARN-1040
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 3.0.0
>            Reporter: Steve Loughran
>         Attachments: YARN-1040-rough-design.pdf
>
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to