[ 
https://issues.apache.org/jira/browse/YARN-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782473#comment-13782473
 ] 

Sandy Ryza commented on YARN-1010:
----------------------------------

This looks almost there to me.  A few nits:
{code}
+        LOG.warn("Error while doing sleep in continuous scheduling: " +
+        e.toString(), e);
{code}
There should be indentation non the second line here.

{code}
+  private void continuousScheduling() {
{code}
Better to have method names be verbs.  Maybe "scheduleContinuously".

Most of the Fair Scheduler properties use dashes at the end instead of dots and 
I think this is a good convention.  We should change 
yarn.scheduler.fair.locality.threshold.node.time.ms to 
yarn.scheduler.fair.locality-delay-node-ms. (And the same for rack).  We should 
also change yarn.scheduler.fair.continuous.scheduling.enabled to 
yarn.scheduler.fair.continuous-scheduling-enabled and 
yarn.scheduler.fair.continuous.scheduling.sleep.time.ms to 
yarn.scheduler.fair.continuous-scheduling-sleep-ms.

Adding multi-second sleeps in the unit tests will slow down build times and is 
still theoretically open to races if the OS pauses.  Better would be to use the 
clock interface.  In the test you can use a MockClock like in 
TestFairScheduler#testChoiceOfPreemptedContainers, and you can change the start 
time in AppSchedulable to come from scheduler.getClock().getTime(). 

> FairScheduler: decouple container scheduling from nodemanager heartbeats
> ------------------------------------------------------------------------
>
>                 Key: YARN-1010
>                 URL: https://issues.apache.org/jira/browse/YARN-1010
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: scheduler
>    Affects Versions: 2.1.0-beta
>            Reporter: Alejandro Abdelnur
>            Assignee: Wei Yan
>            Priority: Critical
>         Attachments: YARN-1010.patch
>
>
> Currently scheduling for a node is done when a node heartbeats.
> For large cluster where the heartbeat interval is set to several seconds this 
> delays scheduling of incoming allocations significantly.
> We could have a continuous loop scanning all nodes and doing scheduling. If 
> there is availability AMs will get the allocation in the next heartbeat after 
> the one that placed the request.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to