[
https://issues.apache.org/jira/browse/YARN-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782473#comment-13782473
]
Sandy Ryza commented on YARN-1010:
----------------------------------
This looks almost there to me. A few nits:
{code}
+ LOG.warn("Error while doing sleep in continuous scheduling: " +
+ e.toString(), e);
{code}
There should be indentation non the second line here.
{code}
+ private void continuousScheduling() {
{code}
Better to have method names be verbs. Maybe "scheduleContinuously".
Most of the Fair Scheduler properties use dashes at the end instead of dots and
I think this is a good convention. We should change
yarn.scheduler.fair.locality.threshold.node.time.ms to
yarn.scheduler.fair.locality-delay-node-ms. (And the same for rack). We should
also change yarn.scheduler.fair.continuous.scheduling.enabled to
yarn.scheduler.fair.continuous-scheduling-enabled and
yarn.scheduler.fair.continuous.scheduling.sleep.time.ms to
yarn.scheduler.fair.continuous-scheduling-sleep-ms.
Adding multi-second sleeps in the unit tests will slow down build times and is
still theoretically open to races if the OS pauses. Better would be to use the
clock interface. In the test you can use a MockClock like in
TestFairScheduler#testChoiceOfPreemptedContainers, and you can change the start
time in AppSchedulable to come from scheduler.getClock().getTime().
> FairScheduler: decouple container scheduling from nodemanager heartbeats
> ------------------------------------------------------------------------
>
> Key: YARN-1010
> URL: https://issues.apache.org/jira/browse/YARN-1010
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: scheduler
> Affects Versions: 2.1.0-beta
> Reporter: Alejandro Abdelnur
> Assignee: Wei Yan
> Priority: Critical
> Attachments: YARN-1010.patch
>
>
> Currently scheduling for a node is done when a node heartbeats.
> For large cluster where the heartbeat interval is set to several seconds this
> delays scheduling of incoming allocations significantly.
> We could have a continuous loop scanning all nodes and doing scheduling. If
> there is availability AMs will get the allocation in the next heartbeat after
> the one that placed the request.
--
This message was sent by Atlassian JIRA
(v6.1#6144)