Wangda Tan commented on YARN-1197:

I think increasing via AM<->NM and RM<->NM are in very similar range of delay. 
(multi-seconds for now)

a. AM<->NM needs 3 stages
1) AM Get increase token from RM
2) AM send increase token to NM
3) Pooling NM about increase status (because we cannot assume increasing can be 
done in NM side very fast)

b. RM->NM needs 4 stages
1) RM send back increasing token to NM
2) NM doing increase locally
3) NM report back to RM when increasing done
4) RM send increase done to AM

Solution b. has an additional RM->NM heartbeat interval

Benefits of b. (Some of them also mentioned by Meng)
- Simpler to AM, only need to know about increase done, don't need to receive 
token and submit/pool NM.
- Create a consistency way for application to increase/decrease containers
- Recovery is simpler, AM only knows increase when its finished, only need to 
handle 2 component recovery (NM/RM) instead of 3 components (NM/RM/AM)

Before we have a fast scheduling design/plan (I don't think we can support 
milli-seconds scheduling for now, too frequent AM heartbeating will overload 
RM), I don't think add an additional NM->RM heartbeat interval is a big problem.

> Support changing resources of an allocated container
> ----------------------------------------------------
>                 Key: YARN-1197
>                 URL: https://issues.apache.org/jira/browse/YARN-1197
>             Project: Hadoop YARN
>          Issue Type: Task
>          Components: api, nodemanager, resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Wangda Tan
>         Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.pdf
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side

This message was sent by Atlassian JIRA

Reply via email to