Wangda Tan commented on YARN-1197:

Thanks for [~mding] thinking and extending to the thorough design doc and 
review from [~vinodkv]. I would really like to see this can be moved forward.

To Vinod's comment:
bq. Didn't understand why we need this RM-NM confirmation. The token from RM to 
AM to NM should be enough for NM to update its view, right?
This is to make sure RM/NM are synchronized, one example is mentioned in 
 In this design, NM/RM are two-way communicate, so RM need acknowledage to NM 
about changes so that NM can change container monitoring status locally to 
avoid inconsistency happens.

bq. To your example of concurrent increase/decrease sizing requests from AM, 
shall we simply say that only one change-in-progress is allowed for any given 
Actually the appoarch in design doc is this (Meng plz let me know if I 
misunderstood). In scheduler's implementation, it allows only one pending 
change request for same container, later change-request will either overwrite 
prior one or rejected. 

Some feedbacks to the design doc so far:
1) For the protocols between servers/AMs, mostly same to previous doc, the 
biggest change I can see is the {{ContainerResourceChangeProto}} in 
{{NodeHeartbeatResponseProto}}, which makes sense to me.
2) For the client side change: 2.2.1, +1 to option 3.
3) For scheduling part, {{The scheduling of an outstanding resource 
increase request to a container will be skipped if there are
either:}}. Both of the two may not needed since AM can require for more 
resource when container increase (e.g. container increased to 4G, and AM wants 
it to be 6G before notify NM).
4) We may not need "reserved increase request", all increase request should be 
considered to be "reserved". But we still need to respect orders of 
applications in LeafQueue, no matter it's original FIFO or Fair (added after 
YARN-3306). We can discuss more scheduling details in separated JIRA.

I will clean up subtasks (some of them are too detailed to me, especially for 
scheduler internal changes). Will post once I finished.

> Support changing resources of an allocated container
> ----------------------------------------------------
>                 Key: YARN-1197
>                 URL: https://issues.apache.org/jira/browse/YARN-1197
>             Project: Hadoop YARN
>          Issue Type: Task
>          Components: api, nodemanager, resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Wangda Tan
>         Attachments: YARN-1197_Design.pdf, mapreduce-project.patch.ver.1, 
> tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
> yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
> yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
> yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
> yarn-server-resourcemanager.patch.ver.1
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side

This message was sent by Atlassian JIRA

Reply via email to