Wangda Tan commented on YARN-1197:

Thanks for comment, [~sseth]/[~sandyr].

Now I'm convinced, from two downstream developers' view. +1 to do the 
AM-RM-AM-NM (a) for increase as the original doc before (b), not sure if (b) is 
really required, we can do (b) if there's any real use cases.

bq. More broadly, just because YARN is not good at hitting sub-second latencies 
doesn't mean that it isn't a design goal. I strongly oppose any argument that 
uses the current slowness of YARN as a justification for why we should make 
architectural decisions that could compromise latencies.
Make sense to me.

bq. I.e. that an AM can receive an increase from the RM, then issue a decrease 
to the NM, and then use its increase to get resources it doesn't deserve?
Yes, if we send increase request to RM, but send decrease request to NM, we 
need to handle complex inconsistency in RM side. You can take a look at latest 
design doc for more details.

bq. I don't think it's possible for the AM to start using the additional 
allocation till the NM has updated all it's state - including writing out 
recovery information for work preserving restart (Thanks Vinod for pointing 
this out). Seems like that poll/callback will be required - unless the plan is 
to route this information via the RM.
Maybe we need to wait all increase steps (monitor/cgroup/state-store) finish 
before using the additional allocation. If a container is 5G, increase to 10G, 
RM/NM crashes before write to state store, and app starts use 10G. After RM 
restart/recovery, NM/RM will think the container is 5G, that will be 

[~mding], do you agree with doing (a)?

> Support changing resources of an allocated container
> ----------------------------------------------------
>                 Key: YARN-1197
>                 URL: https://issues.apache.org/jira/browse/YARN-1197
>             Project: Hadoop YARN
>          Issue Type: Task
>          Components: api, nodemanager, resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Wangda Tan
>         Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.pdf
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side

This message was sent by Atlassian JIRA

Reply via email to