He Tianyi created YARN-5479:
-------------------------------

             Summary: FairScheduler: Scheduling performance improvement
                 Key: YARN-5479
                 URL: https://issues.apache.org/jira/browse/YARN-5479
             Project: Hadoop YARN
          Issue Type: Improvement
          Components: fairscheduler, resourcemanager
    Affects Versions: 2.6.0
            Reporter: He Tianyi


Currently ResourceManager uses a single thread to handle async events for 
scheduling. As number of nodes grows, more events need to be processed in time 
in FairScheduler. Also, increased number of applications & queues slows down 
processing of each single event. 

There are two cases that slow processing of nodeUpdate events is problematic:
A. global throughput is lower than number of nodes through heartbeat rounds. 
This keeps resource from being allocated since the inefficiency.
B. global throughput meets the need, but for some of these rounds, events of 
some nodes cannot get processed before next heartbeat. This brings inefficiency 
handling burst requests (i.e. newly submitted MapReduce application cannot get 
its all task launched soon given enough resource).

Pretty sure some people will encounter the problem eventually after a single 
cluster is scaled to several K of nodes (even with {{assignmultiple}} enabled).

This issue proposes to perform several optimization towards performance in 
FairScheduler {{nodeUpdate}} method. To be specific:
A. trading off fairness with efficiency, queue & app sorting can be skipped (or 
should this be called 'delayed sorting'?). we can either start another 
dedicated thread to do the sorting & updating, or actually perform sorting 
after current result have been used several times (say sort once in every 100 
calls.)

B. performing calculation on {{Resource}} instances is expensive, since at 
least 2 objects ({{ResourceImpl}} and its proto builder) is created each time 
(using 'immutable' apis). the overhead can be eliminated with a light-weighted 
implementation of Resource, which do not instantiate a builder until necessary, 
because most instances are used as intermediate result in scheduler instead of 
being exchanged via IPC. Also, {{createResource}} is using reflection, which 
can be replaced by a plain {{new}} (for scheduler usage only). furthermore, 
perhaps we could 'intern' resource to avoid allocation.

C. other minor changes: such as move {{updateRootMetrics}} call to {{update}}, 
making root queue metrics eventual consistent (which may satisfies most of the 
needs). or introduce counters to {{getResourceUsage}} and make changing of 
resource incrementally instead of recalculate each time.

With A and B, I was looking at 4 times improvement in a cluster with 2K nodes.

Suggestions? Opinions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to