MENG DING commented on YARN-1644:

Thanks a lot [~leftnoteasy] and [~jianhe] for your comments and suggestions.

After more thoughts, I prefer [~jianhe]'s suggestion to synchronize 
{{ContainerMangagerImpl#increaseContainersResource}} with NM-RM registration.  
If we do that, we should be able to resolve the RM recovery race condition 
issue, more specifically:
* If increaseContainersResource happens first, then container resource will be 
increased in NM before NM-RM registration.
* If NM-RM registration happens first, then NM will get a new RM identifier 
after registration. Any subsequent increase request with a token issued by old 
RM will be rejected.

For implementation, I think I can simply synchronize on the {{NMContext}} 
object in both {{ContainerMangagerImpl}} and {{NodeStatusUpdaterImpl}}.

Let me know if you have further thoughts or comments. I am also wondering if we 
should do the same for  {{ContainerMangagerImpl#startContainers}}?

> RM-NM protocol changes and NodeStatusUpdater implementation to support 
> container resizing
> -----------------------------------------------------------------------------------------
>                 Key: YARN-1644
>                 URL: https://issues.apache.org/jira/browse/YARN-1644
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Wangda Tan
>            Assignee: MENG DING
>         Attachments: YARN-1644-YARN-1197.4.patch, 
> YARN-1644-YARN-1197.5.patch, YARN-1644.1.patch, YARN-1644.2.patch, 
> YARN-1644.3.patch, yarn-1644.1.patch

This message was sent by Atlassian JIRA

Reply via email to