[ 
https://issues.apache.org/jira/browse/YARN-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641352#comment-14641352
 ] 

MENG DING commented on YARN-1644:
---------------------------------

bq. Even if NM succeeded to send the increasedContainers to RM, if NM 
re-registers back with RM before the container size is updated, RM will also 
recover the container with old resources.

The {{increasedContainers}} contains the target resource size for each 
container being increased, so if NM-registers back with RM before the container 
size is updated, RM will check both {{containerStatus}} (which contains the old 
resource for the container) *and* {{increasedContainers}} (which contains the 
target resource for the container) in the {{RegisterNodeManager}} request, and 
will be able to recover the correct container size.

bq. The solution I have in mind is that, we do not keep track of extra 
increasedContainers in NMContext. We always rely on NMContext#containers to 
send the container status. RM will check container size based on the 
containerStatus in node heartbeat.

The question I have with this solution is: how does RM know that an increase 
has been successfully completed in NM without an explicit protocol? Does RM 
keep checking the size of each container reported by NM from heartbeat to 
heartbeat, and decide that an increase has been completed if the container size 
from the previous heartbeat is smaller than the container size from the current 
heartbeat? I think this won't work in the RM restart scenario you mentioned. 
Consider the following sequence of events:

* RM restarts while there is an increase going on in NM
* NM re-registers with RM before the container size is updated in NM, and RM 
recovers all containers with old resources, and builds up its internal resource 
bookkeeping for scheduler
* Later on container size is updated in NM, and RM gets the increased container 
size in the next heartbeat request. What should RM do now? It cannot simply go 
ahead to increase the resource bookkeeping in its scheduler, because the 
scheduler did not allocate the extra resource after restart.

IMHO, it is crucial for RM to recover the correct container size during the NM 
registration if there is a pending container resource increase action going on 
in NM, that is the reason I propose to add the {{increasedContainers}} to the 
{{RegisterNodeManagerRequestProto}}, and also make sure that a container is 
only removed from {{increasedContainers}} when its resize is completed in NM. 

Thoughts/comments?

> RM-NM protocol changes and NodeStatusUpdater implementation to support 
> container resizing
> -----------------------------------------------------------------------------------------
>
>                 Key: YARN-1644
>                 URL: https://issues.apache.org/jira/browse/YARN-1644
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Wangda Tan
>            Assignee: MENG DING
>         Attachments: YARN-1644-YARN-1197.4.patch, 
> YARN-1644-YARN-1197.5.patch, YARN-1644.1.patch, YARN-1644.2.patch, 
> YARN-1644.3.patch, yarn-1644.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to