[
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588847#comment-14588847
]
Siddharth Seth commented on YARN-1197:
--------------------------------------
bq. I would argue that waiting for an NM-RM heartbeat is much worse than
waiting for an AM-RM heartbeat. With continuous scheduling, the RM can make
decisions in millisecond time, and the AM can regulate its heartbeats according
to the application's needs to get fast responses. If an NM-RM heartbeat is
involved, the application is at the mercy of the cluster settings, which should
be in the multi-second range for large clusters.
I tend to agree with Sandy's arguments about option a being better in terms of
latency - and that we shouldn't be architecting this in a manner which would
limit it to the seconds range rather than milliseconds / hundreds of
milliseconds when possible.
It's already possible to get fast allocations - low 100s of milliseconds via a
scheduler loop which is delinked from NM heartbeats and a variable AM-RM
heartbeat interval, which is under user control rather than being a cluster
property.
There are going to be improvements to the performance of various protocols in
YARN. HADOOP-11552 opens up one such option which allows AMs to know about
allocations as soon as the scheduler has the made the decision, without a
requirement to poll. Of-course - there's plenty of work to be done before that
can actually be used :)
That said, callbacks on the RPC can be applied at various levels - including
NM-RM communication, which can make option b work fast as well. However, it
will incur the cost of additional RPC roundtrips. Option a, however, can be
fast from the get go with tuning, and also gets better with future enhancements.
I don't think it's possible for the AM to start using the additional allocation
till the NM has updated all it's state - including writing out recovery
information for work preserving restart (Thanks Vinod for pointing this out).
Seems like that poll/callback will be required - unless the plan is to route
this information via the RM.
> Support changing resources of an allocated container
> ----------------------------------------------------
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
> Issue Type: Task
> Components: api, nodemanager, resourcemanager
> Affects Versions: 2.1.0-beta
> Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip,
> YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a
> container is fixed during the lifetime of it. When users want to change a
> resource
> of an allocated container the only way is releasing it and allocating a new
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us
> better control of resource usage in application side
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)