[
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13745799#comment-13745799
]
Wei Yan commented on YARN-1021:
-------------------------------
[~curino] Thanks for your further comments.
bq. I might have missed this, but can you check whether the
ProportionalCapacityPreemptionPolicy (running as a monitor) gets invoked
correctly when running the simulator with the CapacityScheduler (and with
preemption turned on)?
Sure, I'll check that.
bq. For the use of Clocks in the RM I think it was pretty consistent (if I
remember correctly from our simulator attempt). Also notice that more than a
faster version of time, it is important to achieve discrete event simulation,
as this allows awesome debugging... just accelerating/slowing down time does
not give you that. Please consider this seriously, as I think it would heavily
increase the value of your simulator.
Sorry I still not understand why the simulator needs event simulation approach.
Could you share more info about that?
bq. Related to that is consistent replay (as in fully deterministic). This
would be very very valuable to have. I can see this to be invaluable for
debugging, testing, demonstrating features and corner cases etc.
Yes, consistent replay is important in some cases. I'll think about it later.
bq. Simulating NM/AM is indeed costly, thought it stresses a bunch of other
part of the system... pros and cons... I would suggest you to simply make sure
that your architecture/design does not prevent us later on to broaden the scope
of the simulation.
I agree that design of the simulator should conside future improvement. I'll
update when I have some ideas.
> Yarn Scheduler Load Simulator
> -----------------------------
>
> Key: YARN-1021
> URL: https://issues.apache.org/jira/browse/YARN-1021
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: scheduler
> Reporter: Wei Yan
> Assignee: Wei Yan
> Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz,
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch,
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf
>
>
> The Yarn Scheduler is a fertile area of interest with different
> implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile,
> several optimizations are also made to improve scheduler performance for
> different scenarios and workload. Each scheduler algorithm has its own set of
> features, and drives scheduling decisions by many factors, such as fairness,
> capacity guarantee, resource availability, etc. It is very important to
> evaluate a scheduler algorithm very well before we deploy it in a production
> cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling
> algorithm. Evaluating in a real cluster is always time and cost consuming,
> and it is also very hard to find a large-enough cluster. Hence, a simulator
> which can predict how well a scheduler algorithm for some specific workload
> would be quite useful.
> We want to build a Scheduler Load Simulator to simulate large-scale Yarn
> clusters and application loads in a single machine. This would be invaluable
> in furthering Yarn by providing a tool for researchers and developers to
> prototype new scheduler features and predict their behavior and performance
> with reasonable amount of confidence, there-by aiding rapid innovation.
> The simulator will exercise the real Yarn ResourceManager removing the
> network factor by simulating NodeManagers and ApplicationMasters via handling
> and dispatching NM/AMs heartbeat events from within the same JVM.
> To keep tracking of scheduler behavior and performance, a scheduler wrapper
> will wrap the real scheduler.
> The simulator will produce real time metrics while executing, including:
> * Resource usages for whole cluster and each queue, which can be utilized to
> configure cluster and queue's capacity.
> * The detailed application execution trace (recorded in relation to simulated
> time), which can be analyzed to understand/validate the scheduler behavior
> (individual jobs turn around time, throughput, fairness, capacity guarantee,
> etc).
> * Several key metrics of scheduler algorithm, such as time cost of each
> scheduler operation (allocate, handle, etc), which can be utilized by Hadoop
> developers to find the code spots and scalability limits.
> The simulator will provide real time charts showing the behavior of the
> scheduler and its performance.
> A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing
> how to use simulator to simulate Fair Scheduler and Capacity Scheduler.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira