[ 
https://issues.apache.org/jira/browse/YARN-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700423#comment-14700423
 ] 

Wangda Tan commented on YARN-1644:
----------------------------------

Discussed with [~jianhe], some thoughts:

There're 3 corner cases we need to handle:
1. AM send decrease container to RM before send increase container to NM
2. RM crashes after issued increase container, and AM increase container to NM 
during NM registering
3. Same as 2. but AM send decrease container request to RM before RM receives 
NM reported increase container.

What we may need to consider is "version of container", RM will add 1 to 
container version if increased/decreased a container. And container-version 
will be added to ContainerTokenIdentifier, NM reported increased container and 
NMContainerStatus while registering.

>From RM's view, it should keep the latest updated container resource. So for 
>above corner cases:
1. Result: container decreased
2. Result: container increased
3. Result: container decreased (because the latest resource AM sent to RM is 
decrese).

So in RM side, it will check:
{code}
if (rm.version >= nm.version) {
        // keep existing container in RM unchanged, and tell NM about this
        // why include "==" here is, if rm.version == nm.version, corner case 
#3 happened.
} else {
        // change container in RM
}
{code}

So in summary what we need in protocol is:
- Container-version in ContainerTokenIdentifier
- COntainer-version in NMContainerStatus
- add a IncreasedContainer of NM-RM heartbeat, and include container-version in 
IncreasedContainer.

Thoughts? [~mding]

> RM-NM protocol changes and NodeStatusUpdater implementation to support 
> container resizing
> -----------------------------------------------------------------------------------------
>
>                 Key: YARN-1644
>                 URL: https://issues.apache.org/jira/browse/YARN-1644
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Wangda Tan
>            Assignee: MENG DING
>         Attachments: YARN-1644-YARN-1197.4.patch, 
> YARN-1644-YARN-1197.5.patch, YARN-1644.1.patch, YARN-1644.2.patch, 
> YARN-1644.3.patch, yarn-1644.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to