[
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559869#comment-14559869
]
Wangda Tan commented on YARN-1197:
----------------------------------
Thanks for [~mding] thinking and extending to the thorough design doc and
review from [~vinodkv]. I would really like to see this can be moved forward.
To Vinod's comment:
bq. Didn't understand why we need this RM-NM confirmation. The token from RM to
AM to NM should be enough for NM to update its view, right?
This is to make sure RM/NM are synchronized, one example is mentioned in
https://issues.apache.org/jira/browse/YARN-1197?focusedCommentId=14559284&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14559284.
In this design, NM/RM are two-way communicate, so RM need acknowledage to NM
about changes so that NM can change container monitoring status locally to
avoid inconsistency happens.
bq. To your example of concurrent increase/decrease sizing requests from AM,
shall we simply say that only one change-in-progress is allowed for any given
container?
Actually the appoarch in design doc is this (Meng plz let me know if I
misunderstood). In scheduler's implementation, it allows only one pending
change request for same container, later change-request will either overwrite
prior one or rejected.
Some feedbacks to the design doc so far:
1) For the protocols between servers/AMs, mostly same to previous doc, the
biggest change I can see is the {{ContainerResourceChangeProto}} in
{{NodeHeartbeatResponseProto}}, which makes sense to me.
2) For the client side change: 2.2.1, +1 to option 3.
3) For 2.3.3.2 scheduling part, {{The scheduling of an outstanding resource
increase request to a container will be skipped if there are
either:}}. Both of the two may not needed since AM can require for more
resource when container increase (e.g. container increased to 4G, and AM wants
it to be 6G before notify NM).
4) We may not need "reserved increase request", all increase request should be
considered to be "reserved". But we still need to respect orders of
applications in LeafQueue, no matter it's original FIFO or Fair (added after
YARN-3306). We can discuss more scheduling details in separated JIRA.
I will clean up subtasks (some of them are too detailed to me, especially for
scheduler internal changes). Will post once I finished.
> Support changing resources of an allocated container
> ----------------------------------------------------
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
> Issue Type: Task
> Components: api, nodemanager, resourcemanager
> Affects Versions: 2.1.0-beta
> Reporter: Wangda Tan
> Attachments: YARN-1197_Design.pdf, mapreduce-project.patch.ver.1,
> tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf,
> yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf,
> yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1,
> yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1,
> yarn-server-resourcemanager.patch.ver.1
>
>
> The current YARN resource management logic assumes resource allocated to a
> container is fixed during the lifetime of it. When users want to change a
> resource
> of an allocated container the only way is releasing it and allocating a new
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us
> better control of resource usage in application side
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)