[ https://issues.apache.org/jira/browse/YARN-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701590#comment-14701590 ]
MENG DING commented on YARN-1644: --------------------------------- Thanks a lot [~leftnoteasy] and [~jianhe] for your comments and suggestions. After more thoughts, I prefer [~jianhe]'s suggestion to synchronize {{ContainerMangagerImpl#increaseContainersResource}} with NM-RM registration. If we do that, we should be able to resolve the RM recovery race condition issue, more specifically: * If increaseContainersResource happens first, then container resource will be increased in NM before NM-RM registration. * If NM-RM registration happens first, then NM will get a new RM identifier after registration. Any subsequent increase request with a token issued by old RM will be rejected. For implementation, I think I can simply synchronize on the {{NMContext}} object in both {{ContainerMangagerImpl}} and {{NodeStatusUpdaterImpl}}. Let me know if you have further thoughts or comments. I am also wondering if we should do the same for {{ContainerMangagerImpl#startContainers}}? > RM-NM protocol changes and NodeStatusUpdater implementation to support > container resizing > ----------------------------------------------------------------------------------------- > > Key: YARN-1644 > URL: https://issues.apache.org/jira/browse/YARN-1644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Reporter: Wangda Tan > Assignee: MENG DING > Attachments: YARN-1644-YARN-1197.4.patch, > YARN-1644-YARN-1197.5.patch, YARN-1644.1.patch, YARN-1644.2.patch, > YARN-1644.3.patch, yarn-1644.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)