MENG DING commented on YARN-1645:

Thanks for the review [~jianhe] !

bq. This check should not be needed, because AM should be able to resize an 
existing container no matter RM restarted or not.

I have some concerns regarding this that I hope to get some clarifications. 
According to the work-preserving RM restart documentation 

bq. RM recovers its runing state by taking advantage of the container statuses 
sent from all NMs. NM will not kill the containers when it re-syncs with the 
restarted RM. It continues managing the containers and send the container 
statuses across to RM when it re-registers. RM reconstructs the container 
instances and the associated applications’ scheduling status by absorbing these 
containers’ information

Consider this scenario:
* RM approves a container resource increase request and sends an increase token 
to AM. 
* Before AM actually increases the resource on NM, RM crashes and then 
restarts. Because of the work preserving recovery, RM re-constructs the 
container resource based on the information sent by NM, and it is still the old 
resource allocation for the container before the increase.
* Now AM does the increase action on NM. If NM doesn't reject this, it will 
start to enforce the container with increased resource.  Now the views of 
resource allocation between RM and NM are inconsistent.


bq. A lot of code is duplicate between authorizeStartRequest and 
authorizeResourceIncreaseRequest - could you refactor the code to share the 
same code ?
Will do

bq. Portion of the code belongs to YARN-1644 and the patch won't compile.
This is the same situations with YARN-1449. Everything is intertwined :-( May 
need to combine everything into a big patch to submit for jenkins build.

> ContainerManager implementation to support container resizing
> -------------------------------------------------------------
>                 Key: YARN-1645
>                 URL: https://issues.apache.org/jira/browse/YARN-1645
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Wangda Tan
>            Assignee: MENG DING
>         Attachments: YARN-1645.1.patch, YARN-1645.2.patch, yarn-1645.1.patch
> Implementation of ContainerManager for container resize, including:
> 1) ContainerManager resize logic 
> 2) Relevant test cases

This message was sent by Atlassian JIRA

Reply via email to