JiankunLiu created YARN-2381:
--------------------------------

             Summary: CLONE - Yarn Scheduler Load Simulator
                 Key: YARN-2381
                 URL: https://issues.apache.org/jira/browse/YARN-2381
             Project: Hadoop YARN
          Issue Type: New Feature
          Components: scheduler
            Reporter: JiankunLiu
            Assignee: Wei Yan
             Fix For: 2.3.0


The Yarn Scheduler is a fertile area of interest with different 
implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, several 
optimizations are also made to improve scheduler performance for different 
scenarios and workload. Each scheduler algorithm has its own set of features, 
and drives scheduling decisions by many factors, such as fairness, capacity 
guarantee, resource availability, etc. It is very important to evaluate a 
scheduler algorithm very well before we deploy it in a production cluster. 
Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. 
Evaluating in a real cluster is always time and cost consuming, and it is also 
very hard to find a large-enough cluster. Hence, a simulator which can predict 
how well a scheduler algorithm for some specific workload would be quite useful.

We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
clusters and application loads in a single machine. This would be invaluable in 
furthering Yarn by providing a tool for researchers and developers to prototype 
new scheduler features and predict their behavior and performance with 
reasonable amount of confidence, there-by aiding rapid innovation.

The simulator will exercise the real Yarn ResourceManager removing the network 
factor by simulating NodeManagers and ApplicationMasters via handling and 
dispatching NM/AMs heartbeat events from within the same JVM.

To keep tracking of scheduler behavior and performance, a scheduler wrapper 
will wrap the real scheduler.

The simulator will produce real time metrics while executing, including:

* Resource usages for whole cluster and each queue, which can be utilized to 
configure cluster and queue's capacity.
* The detailed application execution trace (recorded in relation to simulated 
time), which can be analyzed to understand/validate the  scheduler behavior 
(individual jobs turn around time, throughput, fairness, capacity guarantee, 
etc).
* Several key metrics of scheduler algorithm, such as time cost of each 
scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
developers to find the code spots and scalability limits.

The simulator will provide real time charts showing the behavior of the 
scheduler and its performance.

A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
how to use simulator to simulate Fair Scheduler and Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to