[ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833259#comment-16833259
 ] 

NedaMaleki commented on YARN-1021:
----------------------------------

*Dear Wei Yan,*

*I use hadoop 2.4.1. When I want to run SLS, I face with the same problem as 
YukunTsang:*

19/05/05 11:54:45 INFO capacity.CapacityScheduler: Added node a2116.smile.com:3 
clusterResource: <memory:40960, vCores:40>
Exception in thread "main" java.lang.RuntimeException: 
java.lang.NullPointerException
    at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
    at 
org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:394)
    at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:246)
    at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:141)
    at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:524)
Caused by: java.lang.NullPointerException
    at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
    at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:123)
    ... 4 more

*After waiting some minutes I got the following messages and then nothing :(*

19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: 
Expired:a2115.smile.com:0 Timed out after 600 secs
19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: 
Expired:a2118.smile.com:1 Timed out after 600 secs
19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: 
Expired:a2117.smile.com:2 Timed out after 600 secs
19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: 
Expired:a2116.smile.com:3 Timed out after 600 secs
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2115.smile.com:0 
as it is now LOST
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2115.smile.com:0 Node Transitioned 
from RUNNING to LOST
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2118.smile.com:1 
as it is now LOST
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2118.smile.com:1 Node Transitioned 
from RUNNING to LOST
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2117.smile.com:2 
as it is now LOST
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2117.smile.com:2 Node Transitioned 
from RUNNING to LOST
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2116.smile.com:3 
as it is now LOST
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2116.smile.com:3 Node Transitioned 
from RUNNING to LOST
19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node 
a2115.smile.com:0 clusterResource: <memory:30720, vCores:30>
19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node 
a2118.smile.com:1 clusterResource: <memory:20480, vCores:20>
19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node 
a2117.smile.com:2 clusterResource: <memory:10240, vCores:10>
19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node 
a2116.smile.com:3 clusterResource: <memory:0, vCores:0>

*I noticed when it reaches to <memory:40960, vCores:40>, it shoots the 
exception and I do not know why.*

 *1) I am looking forward to hear from you as I stuck here!*

*2) My second question is that, where I can extend SLS i.e. where shall I write 
my scheduler code in SLS, run it, and get results? (I need to simulate my 
scheduler and then compare it with other schedulers like FIFO, Fair, and 
Capacity)*

*Thanks a lot,*

 *Neda*

> Yarn Scheduler Load Simulator
> -----------------------------
>
>                 Key: YARN-1021
>                 URL: https://issues.apache.org/jira/browse/YARN-1021
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: scheduler
>            Reporter: Wei Yan
>            Assignee: Wei Yan
>            Priority: Major
>             Fix For: 2.3.0
>
>         Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf
>
>
> The Yarn Scheduler is a fertile area of interest with different 
> implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
> several optimizations are also made to improve scheduler performance for 
> different scenarios and workload. Each scheduler algorithm has its own set of 
> features, and drives scheduling decisions by many factors, such as fairness, 
> capacity guarantee, resource availability, etc. It is very important to 
> evaluate a scheduler algorithm very well before we deploy it in a production 
> cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
> algorithm. Evaluating in a real cluster is always time and cost consuming, 
> and it is also very hard to find a large-enough cluster. Hence, a simulator 
> which can predict how well a scheduler algorithm for some specific workload 
> would be quite useful.
> We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
> clusters and application loads in a single machine. This would be invaluable 
> in furthering Yarn by providing a tool for researchers and developers to 
> prototype new scheduler features and predict their behavior and performance 
> with reasonable amount of confidence, there-by aiding rapid innovation.
> The simulator will exercise the real Yarn ResourceManager removing the 
> network factor by simulating NodeManagers and ApplicationMasters via handling 
> and dispatching NM/AMs heartbeat events from within the same JVM.
> To keep tracking of scheduler behavior and performance, a scheduler wrapper 
> will wrap the real scheduler.
> The simulator will produce real time metrics while executing, including:
> * Resource usages for whole cluster and each queue, which can be utilized to 
> configure cluster and queue's capacity.
> * The detailed application execution trace (recorded in relation to simulated 
> time), which can be analyzed to understand/validate the  scheduler behavior 
> (individual jobs turn around time, throughput, fairness, capacity guarantee, 
> etc).
> * Several key metrics of scheduler algorithm, such as time cost of each 
> scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
> developers to find the code spots and scalability limits.
> The simulator will provide real time charts showing the behavior of the 
> scheduler and its performance.
> A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
> how to use simulator to simulate Fair Scheduler and Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to