Siddharth Seth commented on YARN-1197:

bq. I would argue that waiting for an NM-RM heartbeat is much worse than 
waiting for an AM-RM heartbeat. With continuous scheduling, the RM can make 
decisions in millisecond time, and the AM can regulate its heartbeats according 
to the application's needs to get fast responses. If an NM-RM heartbeat is 
involved, the application is at the mercy of the cluster settings, which should 
be in the multi-second range for large clusters.
I tend to agree with Sandy's arguments about option a being better in terms of 
latency - and that we shouldn't be architecting this in a manner which would 
limit it to the seconds range rather than milliseconds / hundreds of 
milliseconds when possible.

It's already possible to get fast allocations - low 100s of milliseconds via a 
scheduler loop which is delinked from NM heartbeats and a variable AM-RM 
heartbeat interval, which is under user control rather than being a cluster 

There are going to be improvements to the performance of various protocols in 
YARN. HADOOP-11552 opens up one such option which allows AMs to know about 
allocations as soon as the scheduler has the made the decision, without a 
requirement to poll. Of-course - there's plenty of work to be done before that 
can actually be used :)

That said, callbacks on the RPC can be applied at various levels - including 
NM-RM communication, which can make option b work fast as well. However, it 
will incur the cost of additional RPC roundtrips. Option a, however, can be 
fast from the get go with tuning, and also gets better with future enhancements.

I don't think it's possible for the AM to start using the additional allocation 
till the NM has updated all it's state - including writing out recovery 
information for work preserving restart (Thanks Vinod for pointing this out). 
Seems like that poll/callback will be required - unless the plan is to route 
this information via the RM.

> Support changing resources of an allocated container
> ----------------------------------------------------
>                 Key: YARN-1197
>                 URL: https://issues.apache.org/jira/browse/YARN-1197
>             Project: Hadoop YARN
>          Issue Type: Task
>          Components: api, nodemanager, resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Wangda Tan
>         Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.pdf
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side

This message was sent by Atlassian JIRA

Reply via email to