[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833259#comment-16833259 ]
NedaMaleki commented on YARN-1021: ---------------------------------- *Dear Wei Yan,* *I use hadoop 2.4.1. When I want to run SLS, I face with the same problem as YukunTsang:* 19/05/05 11:54:45 INFO capacity.CapacityScheduler: Added node a2116.smile.com:3 clusterResource: <memory:40960, vCores:40> Exception in thread "main" java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:394) at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:246) at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:141) at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:524) Caused by: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:123) ... 4 more *After waiting some minutes I got the following messages and then nothing :(* 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2115.smile.com:0 Timed out after 600 secs 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2118.smile.com:1 Timed out after 600 secs 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2117.smile.com:2 Timed out after 600 secs 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2116.smile.com:3 Timed out after 600 secs 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2115.smile.com:0 as it is now LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2115.smile.com:0 Node Transitioned from RUNNING to LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2118.smile.com:1 as it is now LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2118.smile.com:1 Node Transitioned from RUNNING to LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2117.smile.com:2 as it is now LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2117.smile.com:2 Node Transitioned from RUNNING to LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2116.smile.com:3 as it is now LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2116.smile.com:3 Node Transitioned from RUNNING to LOST 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node a2115.smile.com:0 clusterResource: <memory:30720, vCores:30> 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node a2118.smile.com:1 clusterResource: <memory:20480, vCores:20> 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node a2117.smile.com:2 clusterResource: <memory:10240, vCores:10> 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node a2116.smile.com:3 clusterResource: <memory:0, vCores:0> *I noticed when it reaches to <memory:40960, vCores:40>, it shoots the exception and I do not know why.* *1) I am looking forward to hear from you as I stuck here!* *2) My second question is that, where I can extend SLS i.e. where shall I write my scheduler code in SLS, run it, and get results? (I need to simulate my scheduler and then compare it with other schedulers like FIFO, Fair, and Capacity)* *Thanks a lot,* *Neda* > Yarn Scheduler Load Simulator > ----------------------------- > > Key: YARN-1021 > URL: https://issues.apache.org/jira/browse/YARN-1021 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler > Reporter: Wei Yan > Assignee: Wei Yan > Priority: Major > Fix For: 2.3.0 > > Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf > > > The Yarn Scheduler is a fertile area of interest with different > implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, > several optimizations are also made to improve scheduler performance for > different scenarios and workload. Each scheduler algorithm has its own set of > features, and drives scheduling decisions by many factors, such as fairness, > capacity guarantee, resource availability, etc. It is very important to > evaluate a scheduler algorithm very well before we deploy it in a production > cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling > algorithm. Evaluating in a real cluster is always time and cost consuming, > and it is also very hard to find a large-enough cluster. Hence, a simulator > which can predict how well a scheduler algorithm for some specific workload > would be quite useful. > We want to build a Scheduler Load Simulator to simulate large-scale Yarn > clusters and application loads in a single machine. This would be invaluable > in furthering Yarn by providing a tool for researchers and developers to > prototype new scheduler features and predict their behavior and performance > with reasonable amount of confidence, there-by aiding rapid innovation. > The simulator will exercise the real Yarn ResourceManager removing the > network factor by simulating NodeManagers and ApplicationMasters via handling > and dispatching NM/AMs heartbeat events from within the same JVM. > To keep tracking of scheduler behavior and performance, a scheduler wrapper > will wrap the real scheduler. > The simulator will produce real time metrics while executing, including: > * Resource usages for whole cluster and each queue, which can be utilized to > configure cluster and queue's capacity. > * The detailed application execution trace (recorded in relation to simulated > time), which can be analyzed to understand/validate the scheduler behavior > (individual jobs turn around time, throughput, fairness, capacity guarantee, > etc). > * Several key metrics of scheduler algorithm, such as time cost of each > scheduler operation (allocate, handle, etc), which can be utilized by Hadoop > developers to find the code spots and scalability limits. > The simulator will provide real time charts showing the behavior of the > scheduler and its performance. > A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing > how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org