[ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281276#comment-14281276
 ] 

Fabio Colzada commented on YARN-1021:
-------------------------------------

Hi, I am working with sls on Hadoop 2.6.0, I really need it but I'm struggling 
to have it running at its best. Simulation completes on command line and I can 
see the log entries on the screen, but:

1- the web interface is not working. On screen I can see this exception more or 
less at the beginning of the workflow:
java.lang.NullPointerException
        at org.apache.hadoop.yarn.sls.web.SLSWebApp.<init>(SLSWebApp.java:86)
        at 
org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.initMetrics(ResourceSchedulerWrapper.java:477)
        at 
org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.setConf(ResourceSchedulerWrapper.java:176)
        at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
        at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createScheduler
(ResourceManager.java:291)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit
(ResourceManager.java:484)
        at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices
(ResourceManager.java:989)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit
(ResourceManager.java:255)
        at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.sls.SLSRunner.startRM(SLSRunner.java:167)
        at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:141)
        at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:528)

Not sure which object is null, but I see the folder sls/html has the expected 
files.

2- I don't get the files realtimetrack.json nor jobruntime.csv, while metrics 
folder is correctly populated. I see some recurring exceptions, I don't know if 
they are related since they don't prevent the simulation to terminate:
java.lang.NullPointerException
        at 
org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.addAMRuntime
(ResourceSchedulerWrapper.java:735)
        at 
org.apache.hadoop.yarn.sls.appmaster.AMSimulator.lastStep(AMSimulator.java:193)
        at 
org.apache.hadoop.yarn.sls.appmaster.MRAMSimulator.lastStep(MRAMSimulator.java:396)
        at 
org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:100)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

and also

Exception in thread "pool-5-thread-374" java.lang.NullPointerException
        at 
org.apache.hadoop.yarn.sls.scheduler.TaskRunner$Task.run(TaskRunner.java:104)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Any help is really appreciated

> Yarn Scheduler Load Simulator
> -----------------------------
>
>                 Key: YARN-1021
>                 URL: https://issues.apache.org/jira/browse/YARN-1021
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: scheduler
>            Reporter: Wei Yan
>            Assignee: Wei Yan
>             Fix For: 2.3.0
>
>         Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf
>
>
> The Yarn Scheduler is a fertile area of interest with different 
> implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
> several optimizations are also made to improve scheduler performance for 
> different scenarios and workload. Each scheduler algorithm has its own set of 
> features, and drives scheduling decisions by many factors, such as fairness, 
> capacity guarantee, resource availability, etc. It is very important to 
> evaluate a scheduler algorithm very well before we deploy it in a production 
> cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
> algorithm. Evaluating in a real cluster is always time and cost consuming, 
> and it is also very hard to find a large-enough cluster. Hence, a simulator 
> which can predict how well a scheduler algorithm for some specific workload 
> would be quite useful.
> We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
> clusters and application loads in a single machine. This would be invaluable 
> in furthering Yarn by providing a tool for researchers and developers to 
> prototype new scheduler features and predict their behavior and performance 
> with reasonable amount of confidence, there-by aiding rapid innovation.
> The simulator will exercise the real Yarn ResourceManager removing the 
> network factor by simulating NodeManagers and ApplicationMasters via handling 
> and dispatching NM/AMs heartbeat events from within the same JVM.
> To keep tracking of scheduler behavior and performance, a scheduler wrapper 
> will wrap the real scheduler.
> The simulator will produce real time metrics while executing, including:
> * Resource usages for whole cluster and each queue, which can be utilized to 
> configure cluster and queue's capacity.
> * The detailed application execution trace (recorded in relation to simulated 
> time), which can be analyzed to understand/validate the  scheduler behavior 
> (individual jobs turn around time, throughput, fairness, capacity guarantee, 
> etc).
> * Several key metrics of scheduler algorithm, such as time cost of each 
> scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
> developers to find the code spots and scalability limits.
> The simulator will provide real time charts showing the behavior of the 
> scheduler and its performance.
> A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
> how to use simulator to simulate Fair Scheduler and Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to