MENG DING commented on YARN-1197:

[~sandyr], Yes. The key assumption is that by the time the Application Master 
requests resource decrease from RM for a particular container, that container 
should have already reduced its resource usage. Therefore, RM can immediately 
allocate resource to others. 

So to summarize the main idea:
* Both container resource increase and decrease requests go through RM. This 
eliminates the race condition where while a container increase is in progress, 
a decrease for the same container takes place.
* There is no need for AM-NM protocol anymore. This greatly simplifies the 
logic for application writers.
* Resource decrease can happen immediately in RM, and the actual 
enforce/monitor of the decrease can happen offline, as mentioned by Vinod.
* Resource increase, on the other hand, needs more thoughts. 
** In the current design, the RM gives out an increase token to be used by AM 
to initiate the increase on NM. There is no need for this. RM can notify the 
increase to NM through RM-NM heartbeat response.
** RM still needs to wait for an acknowledgement from NM to confirm that the 
increase is done before sending out response to AM. This will take two 
heartbeat cycles, but this is not much worse than giving out a token to AM 
first, and then letting AM initiating the increase.
** Since RM needs to wait for acknowledgement from NM to confirm the increase, 
we must handle such cases as timeout, NM restart/recovery, etc. So we probably 
still need to have a container increase token, and token expiration logic for 
this purpose, but the token will be sent to NM through RM-NM heartbeat 
protocol. (I am still working out the details)

> Support changing resources of an allocated container
> ----------------------------------------------------
>                 Key: YARN-1197
>                 URL: https://issues.apache.org/jira/browse/YARN-1197
>             Project: Hadoop YARN
>          Issue Type: Task
>          Components: api, nodemanager, resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Wangda Tan
>         Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.pdf
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side

This message was sent by Atlassian JIRA

Reply via email to