[
https://issues.apache.org/jira/browse/YARN-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699939#comment-14699939
]
MENG DING commented on YARN-1644:
---------------------------------
I had an offline discussion with [~jianhe] a while ago, and we thought that the
race condition in scenario 3 can be handled in a separate JIRA, as it applies
to both increase container size and start container.
For this ticket, we are exploring the idea of getting rid of the
{{increasedContainers}} list from NM. The {{increasedContainers}} was
originally introduced as a way to let NM inform RM that an increase action has
been completed in NM. However, it seems that we may achieve the same result by
checking {{containerStatuses}}. In particular, RM will keep checking the
difference of container sizes between heartbeats. For each container:
* If the container size reported from this heartbeat is larger than the size
reported from previous heartbeat:
** If the reported size is the same as RM's bookkeeping for this container,
then this is a confirmation of container resource increase.
** If the reported size is larger than RM's bookkeeping for this container,
then this is due to an RM recovery during container resource increase in NM. RM
should increase its bookkeeping of this container to match the reported size.
** If the reported size is smaller than RM's bookkeeping for this container, it
should be an error.
* If the container size reported from this heartbeat is smaller than the size
reported from previous heartbeat:
** If the reported size is the same as RM's bookkeeping for this container,
then this is a confirmation of container resource decrease.
** Any other case should be an error.
The validity of this approach is still being decided. Any comments/concerns are
welcome.
> RM-NM protocol changes and NodeStatusUpdater implementation to support
> container resizing
> -----------------------------------------------------------------------------------------
>
> Key: YARN-1644
> URL: https://issues.apache.org/jira/browse/YARN-1644
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Reporter: Wangda Tan
> Assignee: MENG DING
> Attachments: YARN-1644-YARN-1197.4.patch,
> YARN-1644-YARN-1197.5.patch, YARN-1644.1.patch, YARN-1644.2.patch,
> YARN-1644.3.patch, yarn-1644.1.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)