[
https://issues.apache.org/jira/browse/YARN-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700423#comment-14700423
]
Wangda Tan commented on YARN-1644:
----------------------------------
Discussed with [~jianhe], some thoughts:
There're 3 corner cases we need to handle:
1. AM send decrease container to RM before send increase container to NM
2. RM crashes after issued increase container, and AM increase container to NM
during NM registering
3. Same as 2. but AM send decrease container request to RM before RM receives
NM reported increase container.
What we may need to consider is "version of container", RM will add 1 to
container version if increased/decreased a container. And container-version
will be added to ContainerTokenIdentifier, NM reported increased container and
NMContainerStatus while registering.
>From RM's view, it should keep the latest updated container resource. So for
>above corner cases:
1. Result: container decreased
2. Result: container increased
3. Result: container decreased (because the latest resource AM sent to RM is
decrese).
So in RM side, it will check:
{code}
if (rm.version >= nm.version) {
// keep existing container in RM unchanged, and tell NM about this
// why include "==" here is, if rm.version == nm.version, corner case
#3 happened.
} else {
// change container in RM
}
{code}
So in summary what we need in protocol is:
- Container-version in ContainerTokenIdentifier
- COntainer-version in NMContainerStatus
- add a IncreasedContainer of NM-RM heartbeat, and include container-version in
IncreasedContainer.
Thoughts? [~mding]
> RM-NM protocol changes and NodeStatusUpdater implementation to support
> container resizing
> -----------------------------------------------------------------------------------------
>
> Key: YARN-1644
> URL: https://issues.apache.org/jira/browse/YARN-1644
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Reporter: Wangda Tan
> Assignee: MENG DING
> Attachments: YARN-1644-YARN-1197.4.patch,
> YARN-1644-YARN-1197.5.patch, YARN-1644.1.patch, YARN-1644.2.patch,
> YARN-1644.3.patch, yarn-1644.1.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)