[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588889#comment-14588889 ]
Wangda Tan commented on YARN-1197: ---------------------------------- Thanks for comment, [~sseth]/[~sandyr]. Now I'm convinced, from two downstream developers' view. +1 to do the AM-RM-AM-NM (a) for increase as the original doc before (b), not sure if (b) is really required, we can do (b) if there's any real use cases. bq. More broadly, just because YARN is not good at hitting sub-second latencies doesn't mean that it isn't a design goal. I strongly oppose any argument that uses the current slowness of YARN as a justification for why we should make architectural decisions that could compromise latencies. Make sense to me. bq. I.e. that an AM can receive an increase from the RM, then issue a decrease to the NM, and then use its increase to get resources it doesn't deserve? Yes, if we send increase request to RM, but send decrease request to NM, we need to handle complex inconsistency in RM side. You can take a look at latest design doc for more details. bq. I don't think it's possible for the AM to start using the additional allocation till the NM has updated all it's state - including writing out recovery information for work preserving restart (Thanks Vinod for pointing this out). Seems like that poll/callback will be required - unless the plan is to route this information via the RM. Maybe we need to wait all increase steps (monitor/cgroup/state-store) finish before using the additional allocation. If a container is 5G, increase to 10G, RM/NM crashes before write to state store, and app starts use 10G. After RM restart/recovery, NM/RM will think the container is 5G, that will be problematic. [~mding], do you agree with doing (a)? > Support changing resources of an allocated container > ---------------------------------------------------- > > Key: YARN-1197 > URL: https://issues.apache.org/jira/browse/YARN-1197 > Project: Hadoop YARN > Issue Type: Task > Components: api, nodemanager, resourcemanager > Affects Versions: 2.1.0-beta > Reporter: Wangda Tan > Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, > YARN-1197_Design.pdf > > > The current YARN resource management logic assumes resource allocated to a > container is fixed during the lifetime of it. When users want to change a > resource > of an allocated container the only way is releasing it and allocating a new > container with expected size. > Allowing run-time changing resources of an allocated container will give us > better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)