Sandy Ryza commented on YARN-1197:

Option (a) can occur in the low hundreds of milliseconds if the cluster is 
tuned properly, independent of cluster size.
1) Submit increase request to RM.  Poll RM 100 milliseconds later after 
continuous scheduling thread has run in order to pick up the increase token.
2) Send increase token to NM.

Why does the AM need to poll the NM about increase status before taking action? 
 Does the NM need to do anything other than update its tracking of the 
resources allotted to the container?

Also, it's not unlikely that schedulers will be improved to return the increase 
token on the same heartbeat that it's requested.  So this could all happen in 2 
RPCs + a scheduler decision, and no additional wait time.  Anything more than 
this is probably prohibitively expensive for a framework like Spark to submit 
an increase request before running each task.

Would option (b) ever be able to achieve this kind of latency?

> Support changing resources of an allocated container
> ----------------------------------------------------
>                 Key: YARN-1197
>                 URL: https://issues.apache.org/jira/browse/YARN-1197
>             Project: Hadoop YARN
>          Issue Type: Task
>          Components: api, nodemanager, resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Wangda Tan
>         Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.pdf
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side

This message was sent by Atlassian JIRA

Reply via email to