[
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586999#comment-14586999
]
MENG DING commented on YARN-1197:
---------------------------------
[~sandyr], Yes. The key assumption is that by the time the Application Master
requests resource decrease from RM for a particular container, that container
should have already reduced its resource usage. Therefore, RM can immediately
allocate resource to others.
So to summarize the main idea:
* Both container resource increase and decrease requests go through RM. This
eliminates the race condition where while a container increase is in progress,
a decrease for the same container takes place.
* There is no need for AM-NM protocol anymore. This greatly simplifies the
logic for application writers.
* Resource decrease can happen immediately in RM, and the actual
enforce/monitor of the decrease can happen offline, as mentioned by Vinod.
* Resource increase, on the other hand, needs more thoughts.
** In the current design, the RM gives out an increase token to be used by AM
to initiate the increase on NM. There is no need for this. RM can notify the
increase to NM through RM-NM heartbeat response.
** RM still needs to wait for an acknowledgement from NM to confirm that the
increase is done before sending out response to AM. This will take two
heartbeat cycles, but this is not much worse than giving out a token to AM
first, and then letting AM initiating the increase.
** Since RM needs to wait for acknowledgement from NM to confirm the increase,
we must handle such cases as timeout, NM restart/recovery, etc. So we probably
still need to have a container increase token, and token expiration logic for
this purpose, but the token will be sent to NM through RM-NM heartbeat
protocol. (I am still working out the details)
> Support changing resources of an allocated container
> ----------------------------------------------------
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
> Issue Type: Task
> Components: api, nodemanager, resourcemanager
> Affects Versions: 2.1.0-beta
> Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip,
> YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a
> container is fixed during the lifetime of it. When users want to change a
> resource
> of an allocated container the only way is releasing it and allocating a new
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us
> better control of resource usage in application side
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)