[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559457#comment-14559457
 ] 

Vinod Kumar Vavilapalli commented on YARN-1197:
-----------------------------------------------

Tx for taking this up [~mding]!

Read your updated doc. Looks good overall. Pretty comprehensive, great work!

Some comments

h4. Major
 - Expanding containers at ACQUIRED state sounds useful in theory. But agree 
with you that we can punt it for later.
 - To your example of concurrent increase/decrease sizing requests from AM, 
shall we simply say that only one change-in-progress is allowed for any given 
container?
 - If we do the above, this will also simplify most of the code, as we will 
simply have the notion of a _Change_, instead of an explicit increase/decrease 
everywhere. For e.g., we will just have a ContainerResourceChangeExpirer.
 - There will be races with container-states toggling from RUNNING to finished 
states, depending on when AM requests a size-change and when NMs report that a 
container finished. We can simply say that the state at the ResourceManager 
wins. 
 - bq. After processing all resource change messages for a container in a node 
update, RM will set the current resource allocation known by RM for this 
container in the next node heartbeat response, so that NM will  (eventually) 
have the same view of the resource allocation of this container with RM, and 
monitor/enforce accordingly.
Didn't understand why we need this RM-NM confirmation. The token from RM to AM 
to NM should be enough for NM to update its view, right?

h4. Minor
 - Instead of adding new records for ContainerResourceIncrease / decrease in 
AllocationResponse, should we add a new field in the API record itself stating 
if it is a New/Increased/Decreased container? If we move to a single change 
model, it's likely we will not even need this.
 - Any obviously invalid change-requests should be rejected right-away. For 
e.g, an increase to more than cluster's max container size. Seemed like you are 
suggesting we ignore the invalid requests.
 - Nit: In the design doc, the high-level flow for container-increase point #7 
incorrectly talks about decrease instead of increase.

Just caught up with the rest of your discussion w.r.t decreasing 
container-sizes. the feature is useful outside of JVM processes - C code, 
servers managing their data off-heap etc, so we can continue working on it.

h4. Process
I propose we do this in a branch. We got in a couple of patches earlier from 
[~leftnoteasy] and then the feature unfortunately dropped on the floor. Branch 
helps avoid this going forward.

> Support changing resources of an allocated container
> ----------------------------------------------------
>
>                 Key: YARN-1197
>                 URL: https://issues.apache.org/jira/browse/YARN-1197
>             Project: Hadoop YARN
>          Issue Type: Task
>          Components: api, nodemanager, resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Wangda Tan
>         Attachments: YARN-1197_Design.pdf, mapreduce-project.patch.ver.1, 
> tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
> yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
> yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
> yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
> yarn-server-resourcemanager.patch.ver.1
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to