[
https://issues.apache.org/jira/browse/YARN-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14642078#comment-14642078
]
Jian He commented on YARN-1644:
-------------------------------
bq. add the increasedContainers to the RegisterNodeManagerRequestProto, make
sure that a container is only removed from increasedContainers when its resize
is completed in NM.
IIUC, this has the same problem? NM re-registration can still happen between
the time the increase action is accepted, and the time it's added into
increasedContainers. Even startContainer has the same problem, newly started
container may fall into this tiny window that RM won't recover this container.
Maybe RM could react on the node heartbeat with respect to increase container
as necessary too to handle this race condition ?
> RM-NM protocol changes and NodeStatusUpdater implementation to support
> container resizing
> -----------------------------------------------------------------------------------------
>
> Key: YARN-1644
> URL: https://issues.apache.org/jira/browse/YARN-1644
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Reporter: Wangda Tan
> Assignee: MENG DING
> Attachments: YARN-1644-YARN-1197.4.patch,
> YARN-1644-YARN-1197.5.patch, YARN-1644.1.patch, YARN-1644.2.patch,
> YARN-1644.3.patch, yarn-1644.1.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)