Wangda Tan commented on YARN-1197:
Thanks for comment, [~sseth]/[~sandyr].
Now I'm convinced, from two downstream developers' view. +1 to do the
AM-RM-AM-NM (a) for increase as the original doc before (b), not sure if (b) is
really required, we can do (b) if there's any real use cases.
bq. More broadly, just because YARN is not good at hitting sub-second latencies
doesn't mean that it isn't a design goal. I strongly oppose any argument that
uses the current slowness of YARN as a justification for why we should make
architectural decisions that could compromise latencies.
Make sense to me.
bq. I.e. that an AM can receive an increase from the RM, then issue a decrease
to the NM, and then use its increase to get resources it doesn't deserve?
Yes, if we send increase request to RM, but send decrease request to NM, we
need to handle complex inconsistency in RM side. You can take a look at latest
design doc for more details.
bq. I don't think it's possible for the AM to start using the additional
allocation till the NM has updated all it's state - including writing out
recovery information for work preserving restart (Thanks Vinod for pointing
this out). Seems like that poll/callback will be required - unless the plan is
to route this information via the RM.
Maybe we need to wait all increase steps (monitor/cgroup/state-store) finish
before using the additional allocation. If a container is 5G, increase to 10G,
RM/NM crashes before write to state store, and app starts use 10G. After RM
restart/recovery, NM/RM will think the container is 5G, that will be
[~mding], do you agree with doing (a)?
> Support changing resources of an allocated container
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
> Issue Type: Task
> Components: api, nodemanager, resourcemanager
> Affects Versions: 2.1.0-beta
> Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip,
> The current YARN resource management logic assumes resource allocated to a
> container is fixed during the lifetime of it. When users want to change a
> of an allocated container the only way is releasing it and allocating a new
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us
> better control of resource usage in application side
This message was sent by Atlassian JIRA