[
https://issues.apache.org/jira/browse/YARN-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701590#comment-14701590
]
MENG DING commented on YARN-1644:
---------------------------------
Thanks a lot [~leftnoteasy] and [~jianhe] for your comments and suggestions.
After more thoughts, I prefer [~jianhe]'s suggestion to synchronize
{{ContainerMangagerImpl#increaseContainersResource}} with NM-RM registration.
If we do that, we should be able to resolve the RM recovery race condition
issue, more specifically:
* If increaseContainersResource happens first, then container resource will be
increased in NM before NM-RM registration.
* If NM-RM registration happens first, then NM will get a new RM identifier
after registration. Any subsequent increase request with a token issued by old
RM will be rejected.
For implementation, I think I can simply synchronize on the {{NMContext}}
object in both {{ContainerMangagerImpl}} and {{NodeStatusUpdaterImpl}}.
Let me know if you have further thoughts or comments. I am also wondering if we
should do the same for {{ContainerMangagerImpl#startContainers}}?
> RM-NM protocol changes and NodeStatusUpdater implementation to support
> container resizing
> -----------------------------------------------------------------------------------------
>
> Key: YARN-1644
> URL: https://issues.apache.org/jira/browse/YARN-1644
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Reporter: Wangda Tan
> Assignee: MENG DING
> Attachments: YARN-1644-YARN-1197.4.patch,
> YARN-1644-YARN-1197.5.patch, YARN-1644.1.patch, YARN-1644.2.patch,
> YARN-1644.3.patch, yarn-1644.1.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)