[
https://issues.apache.org/jira/browse/YARN-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14624921#comment-14624921
]
MENG DING commented on YARN-1645:
---------------------------------
Thanks for the review [~jianhe] !
bq. This check should not be needed, because AM should be able to resize an
existing container no matter RM restarted or not.
I have some concerns regarding this that I hope to get some clarifications.
According to the work-preserving RM restart documentation
(http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html):
bq. RM recovers its runing state by taking advantage of the container statuses
sent from all NMs. NM will not kill the containers when it re-syncs with the
restarted RM. It continues managing the containers and send the container
statuses across to RM when it re-registers. RM reconstructs the container
instances and the associated applications’ scheduling status by absorbing these
containers’ information
Consider this scenario:
* RM approves a container resource increase request and sends an increase token
to AM.
* Before AM actually increases the resource on NM, RM crashes and then
restarts. Because of the work preserving recovery, RM re-constructs the
container resource based on the information sent by NM, and it is still the old
resource allocation for the container before the increase.
* Now AM does the increase action on NM. If NM doesn't reject this, it will
start to enforce the container with increased resource. Now the views of
resource allocation between RM and NM are inconsistent.
Thoughts?
bq. A lot of code is duplicate between authorizeStartRequest and
authorizeResourceIncreaseRequest - could you refactor the code to share the
same code ?
Will do
bq. Portion of the code belongs to YARN-1644 and the patch won't compile.
This is the same situations with YARN-1449. Everything is intertwined :-( May
need to combine everything into a big patch to submit for jenkins build.
> ContainerManager implementation to support container resizing
> -------------------------------------------------------------
>
> Key: YARN-1645
> URL: https://issues.apache.org/jira/browse/YARN-1645
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Reporter: Wangda Tan
> Assignee: MENG DING
> Attachments: YARN-1645.1.patch, YARN-1645.2.patch, yarn-1645.1.patch
>
>
> Implementation of ContainerManager for container resize, including:
> 1) ContainerManager resize logic
> 2) Relevant test cases
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)