[ 
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833259#comment-16833259
 ] 

NedaMaleki edited comment on YARN-1021 at 5/5/19 8:12 AM:
----------------------------------------------------------

*Dear Wei Yan,*

*I use hadoop 2.4.1. When I want to run SLS, I face with the same problem as :*

/usr/local/hadoop/share/hadoop/tools/sls/bin/slsrun.sh 
--input-rumen=/usr/local/hadoop/share/hadoop/tools/sls/sample-data/2jobs2min-rumen-jh.json
 --output-dir=/usr/local/hadoop/share/hadoop/tools/sls/output
19/05/05 11:52:31 INFO conf.Configuration: found resource core-site.xml at 
file:/usr/local/hadoop/etc/hadoop/core-site.xml
19/05/05 11:52:31 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
19/05/05 11:52:31 INFO security.Groups: clearing userToGroupsMap cache
19/05/05 11:52:31 INFO conf.Configuration: found resource yarn-site.xml at 
file:/usr/local/hadoop/etc/hadoop/yarn-site.xml
19/05/05 11:52:31 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEventType for class 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMFatalEventDispatcher
19/05/05 11:52:32 INFO security.NMTokenSecretManagerInRM: 
NMTokenKeyRollingInterval: 86400000ms and NMTokenKeyActivationDelay: 900000ms
19/05/05 11:52:32 INFO security.RMContainerTokenSecretManager: 
ContainerTokenKeyRollingInterval: 86400000ms and 
ContainerTokenKeyActivationDelay: 900000ms
19/05/05 11:52:32 INFO security.AMRMTokenSecretManager: Rolling master-key for 
amrm-tokens
19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType 
for class 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler

19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType 
for class 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler
 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.NodesListManagerEventType for 
class org.apache.hadoop.yarn.server.resourcemanager.NodesListManager
 19/05/05 11:52:32 INFO resourcemanager.ResourceManager: Using Scheduler: 
{color:#ff0000}*org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper*{color}
 {color:#ff0000}*java.lang.NullPointerException*{color}
 {color:#ff0000}    *at 
org.apache.hadoop.yarn.sls.web.SLSWebApp.<init>(SLSWebApp.java:82)*{color}
 {color:#ff0000}    *at 
org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.initMetrics(ResourceSchedulerWrapper.java:465)*{color}
 {color:#ff0000}    *at 
org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.setConf(ResourceSchedulerWrapper.java:164)*{color}
 {color:#ff0000}    *at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)*{color}
 {color:#ff0000}    *at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)*{color}
 {color:#ff0000}    *at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createScheduler(ResourceManager.java:261)*{color}
 {color:#ff0000}    *at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:403)*{color}
 {color:#ff0000}    *at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)*{color}
 {color:#ff0000}    *at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:824)*{color}
 {color:#ff0000}    *at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:226)*{color}
 {color:#ff0000}    *at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)*{color}
 {color:#ff0000}    *at 
org.apache.hadoop.yarn.sls.SLSRunner.startRM(SLSRunner.java:163)*{color}
 {color:#ff0000}    *at 
org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:137)*{color}
 {color:#ff0000}    *at 
org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:524)*{color}
 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEventType
 for class 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher
 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppEventType for class 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher
 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEventType
 for class 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher
 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType for class 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher
 19/05/05 11:52:32 INFO impl.MetricsConfig: loaded properties from 
hadoop-metrics2.properties
 19/05/05 11:52:32 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 
second(s).
 19/05/05 11:52:32 INFO impl.MetricsSystemImpl: ResourceManager metrics system 
started
 19/05/05 11:52:32 INFO conf.Configuration: found resource 
capacity-scheduler.xml at 
[file:/usr/local/hadoop/etc/hadoop/capacity-scheduler.xml|file:///usr/local/hadoop/etc/hadoop/capacity-scheduler.xml]
 19/05/05 11:52:32 INFO capacity.ParentQueue: root, capacity=1.0, 
asboluteCapacity=1.0, maxCapacity=1.0, asboluteMaxCapacity=1.0, state=RUNNING, 
acls=SUBMIT_APPLICATIONS:*ADMINISTER_QUEUE:*
 19/05/05 11:52:32 INFO capacity.ParentQueue: Initialized parent-queue root 
name=root, fullname=root
 19/05/05 11:52:32 INFO capacity.LeafQueue: Initializing default
 capacity = 1.0 [= (float) configuredCapacity / 100 ]
 asboluteCapacity = 1.0 [= parentAbsoluteCapacity * capacity ]
 maxCapacity = 1.0 [= configuredMaxCapacity ]
 absoluteMaxCapacity = 1.0 [= 1.0 maximumCapacity undefined, 
(parentAbsoluteMaxCapacity * maximumCapacity) / 100 otherwise ]
 userLimit = 100 [= configuredUserLimit ]
 userLimitFactor = 1.0 [= configuredUserLimitFactor ]
 maxApplications = 10000 [= configuredMaximumSystemApplicationsPerQueue or 
(int)(configuredMaximumSystemApplications * absoluteCapacity)]
 maxApplicationsPerUser = 10000 [= (int)(maxApplications * (userLimit / 100.0f) 
* userLimitFactor) ]
 maxActiveApplications = 1 [= max((int)ceil((clusterResourceMemory / 
minimumAllocation) * maxAMResourcePerQueuePercent * absoluteMaxCapacity),1) ]
 maxActiveAppsUsingAbsCap = 1 [= max((int)ceil((clusterResourceMemory / 
minimumAllocation) *maxAMResourcePercent * absoluteCapacity),1) ]
 maxActiveApplicationsPerUser = 1 [= max((int)(maxActiveApplications * 
(userLimit / 100.0f) * userLimitFactor),1) ]
 usedCapacity = 0.0 [= usedResourcesMemory / (clusterResourceMemory * 
absoluteCapacity)]
 absoluteUsedCapacity = 0.0 [= usedResourcesMemory / clusterResourceMemory]
 maxAMResourcePerQueuePercent = 0.1 [= configuredMaximumAMResourcePercent ]
 minimumAllocationFactor = 0.875 [= (float)(maximumAllocationMemory - 
minimumAllocationMemory) / maximumAllocationMemory ]
 numContainers = 0 [= currentNumContainers ]
 state = RUNNING [= configuredState ]
 acls = SUBMIT_APPLICATIONS:*ADMINISTER_QUEUE:* [= configuredAcls ]
 nodeLocalityDelay = 40

19/05/05 11:52:32 INFO capacity.CapacityScheduler: Initialized queue: default: 
capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>, 
usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0
 19/05/05 11:52:32 INFO capacity.CapacityScheduler: Initialized queue: root: 
numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, 
vCores:0>usedCapacity=0.0, numApps=0, numContainers=0
 19/05/05 11:52:32 INFO capacity.CapacityScheduler: Initialized root queue 
root: numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, 
usedResources=<memory:0, vCores:0>usedCapacity=0.0, numApps=0, numContainers=0
 19/05/05 11:52:32 INFO capacity.CapacityScheduler: Initialized 
CapacityScheduler with calculator=class 
org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
minimumAllocation=<<memory:1024, vCores:1>>, maximumAllocation=<<memory:8192, 
vCores:32>>, asynchronousScheduling=false, asyncScheduleInterval=5ms
 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManagerEventType for class 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager
 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncherEventType 
for class 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher
 19/05/05 11:52:33 INFO resourcemanager.RMNMInfo: Registered RMNMInfo MBean
 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for 
class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for 
class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for 
class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for 
class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for 
class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for 
class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for 
class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for 
class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for 
class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for 
class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
 19/05/05 11:52:33 INFO util.HostsFileReader: Refreshing hosts 
(include/exclude) list
 19/05/05 11:52:43 INFO resourcemanager.ResourceManager: Transitioning to 
active state
 19/05/05 11:52:43 INFO security.RMContainerTokenSecretManager: Rolling 
master-key for container-tokens
 19/05/05 11:52:43 INFO security.AMRMTokenSecretManager: Rolling master-key for 
amrm-tokens
 19/05/05 11:52:43 INFO security.NMTokenSecretManagerInRM: Rolling master-key 
for nm-tokens
 19/05/05 11:52:43 INFO delegation.AbstractDelegationTokenSecretManager: 
Updating the current master key for generating delegation tokens
 19/05/05 11:52:43 INFO security.RMDelegationTokenSecretManager: storing master 
key with keyID 1
 19/05/05 11:52:43 INFO delegation.AbstractDelegationTokenSecretManager: 
Starting expired delegation token remover thread, tokenRemoverScanInterval=60 
min(s)
 19/05/05 11:52:43 INFO delegation.AbstractDelegationTokenSecretManager: 
Updating the current master key for generating delegation tokens
 19/05/05 11:52:43 INFO security.RMDelegationTokenSecretManager: storing master 
key with keyID 2
 19/05/05 11:52:53 INFO ipc.CallQueueManager: Using callQueue class 
java.util.concurrent.LinkedBlockingQueue
 19/05/05 11:52:53 INFO ipc.Server: Starting Socket Reader #1 for port 8031
 19/05/05 11:52:53 INFO pb.RpcServerFactoryPBImpl: Adding protocol 
org.apache.hadoop.yarn.server.api.ResourceTrackerPB to the server
 19/05/05 11:52:53 INFO ipc.Server: IPC Server Responder: starting
 19/05/05 11:52:53 INFO ipc.Server: IPC Server listener on 8031: starting
 19/05/05 11:53:03 INFO ipc.CallQueueManager: Using callQueue class 
java.util.concurrent.LinkedBlockingQueue
 19/05/05 11:53:03 INFO ipc.Server: Starting Socket Reader #1 for port 8030
 19/05/05 11:53:03 INFO pb.RpcServerFactoryPBImpl: Adding protocol 
org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB to the server
 19/05/05 11:53:03 INFO ipc.Server: IPC Server listener on 8030: starting
 19/05/05 11:53:03 INFO ipc.Server: IPC Server Responder: starting
 19/05/05 11:53:13 INFO ipc.CallQueueManager: Using callQueue class 
java.util.concurrent.LinkedBlockingQueue
 19/05/05 11:53:13 INFO ipc.Server: Starting Socket Reader #1 for port 8032
 19/05/05 11:53:13 INFO pb.RpcServerFactoryPBImpl: Adding protocol 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB to the server
 19/05/05 11:53:13 INFO ipc.Server: IPC Server Responder: starting
 19/05/05 11:53:13 INFO ipc.Server: IPC Server listener on 8032: starting
 19/05/05 11:53:14 INFO resourcemanager.ResourceManager: Transitioned to active 
state
 19/05/05 11:53:14 INFO mortbay.log: Logging to 
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
 19/05/05 11:53:14 INFO http.HttpRequestLog: Http request log for 
http.requests.resourcemanager is not defined
 19/05/05 11:53:14 INFO http.HttpServer2: Added global filter 'safety' 
(class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
 19/05/05 11:53:14 INFO http.HttpServer2: Added filter static_user_filter 
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to 
context cluster
 19/05/05 11:53:14 INFO http.HttpServer2: Added filter static_user_filter 
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to 
context logs
 19/05/05 11:53:14 INFO http.HttpServer2: Added filter static_user_filter 
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to 
context static
 19/05/05 11:53:14 INFO http.HttpServer2: adding path spec: /cluster/*
 19/05/05 11:53:14 INFO http.HttpServer2: adding path spec: /ws/*
 19/05/05 11:53:14 INFO http.HttpServer2: Jetty bound to port 8088
 19/05/05 11:53:14 INFO mortbay.log: jetty-6.1.26
 19/05/05 11:53:14 INFO mortbay.log: Extract 
jar:[file:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.4.1.jar!/webapps/cluster|file:///usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.4.1.jar!/webapps/cluster]
 to /tmp/Jetty_0_0_0_0_8088_cluster____u0rgz3/webapp
 19/05/05 11:53:14 INFO mortbay.log: Started SelectChannelConnector@0.0.0.0:8088
 19/05/05 11:53:14 INFO webapp.WebApps: Web app /cluster started at 8088
 19/05/05 11:53:14 INFO webapp.WebApps: Registered webapp guice modules
 19/05/05 11:53:25 INFO ipc.CallQueueManager: Using callQueue class 
java.util.concurrent.LinkedBlockingQueue
 19/05/05 11:53:25 INFO ipc.Server: Starting Socket Reader #1 for port 8033
 19/05/05 11:53:25 INFO pb.RpcServerFactoryPBImpl: Adding protocol 
org.apache.hadoop.yarn.server.api.ResourceManagerAdministrationProtocolPB to 
the server
 19/05/05 11:53:25 INFO ipc.Server: IPC Server Responder: starting
 19/05/05 11:53:25 INFO ipc.Server: IPC Server listener on 8033: starting
 19/05/05 11:53:45 INFO util.RackResolver: Resolved a2115.smile.com to 
/default-rack
 19/05/05 11:53:45 INFO resourcemanager.ResourceTrackerService: NodeManager 
from node a2115.smile.com(cmPort: 0 httpPort: 80) registered with capability: 
<memory:10240, vCores:10>, assigned nodeId a2115.smile.com:0
 19/05/05 11:53:45 INFO rmnode.RMNodeImpl: a2115.smile.com:0 Node Transitioned 
from NEW to RUNNING
 19/05/05 11:53:45 INFO capacity.CapacityScheduler: Added node 
a2115.smile.com:0 clusterResource: <memory:10240, vCores:10>
 19/05/05 11:54:05 INFO util.RackResolver: Resolved a2118.smile.com to 
/default-rack
 19/05/05 11:54:05 INFO resourcemanager.ResourceTrackerService: NodeManager 
from node a2118.smile.com(cmPort: 1 httpPort: 80) registered with capability: 
<memory:10240, vCores:10>, assigned nodeId a2118.smile.com:1
 19/05/05 11:54:05 INFO rmnode.RMNodeImpl: a2118.smile.com:1 Node Transitioned 
from NEW to RUNNING
 19/05/05 11:54:05 INFO capacity.CapacityScheduler: Added node 
a2118.smile.com:1 clusterResource: <memory:20480, vCores:20>
 19/05/05 11:54:25 INFO util.RackResolver: Resolved a2117.smile.com to 
/default-rack
 19/05/05 11:54:25 INFO resourcemanager.ResourceTrackerService: NodeManager 
from node a2117.smile.com(cmPort: 2 httpPort: 80) registered with capability: 
<memory:10240, vCores:10>, assigned nodeId a2117.smile.com:2
 19/05/05 11:54:25 INFO rmnode.RMNodeImpl: a2117.smile.com:2 Node Transitioned 
from NEW to RUNNING
 19/05/05 11:54:25 INFO capacity.CapacityScheduler: Added node 
a2117.smile.com:2 clusterResource: <memory:30720, vCores:30>
 19/05/05 11:54:45 INFO util.RackResolver: Resolved a2116.smile.com to 
/default-rack
 19/05/05 11:54:45 INFO resourcemanager.ResourceTrackerService: NodeManager 
from node a2116.smile.com(cmPort: 3 httpPort: 80) registered with capability: 
<memory:10240, vCores:10>, assigned nodeId a2116.smile.com:3
 19/05/05 11:54:45 INFO rmnode.RMNodeImpl: a2116.smile.com:3 Node Transitioned 
from NEW to RUNNING
 {color:#ff0000}19/05/05 11:54:45 INFO capacity.CapacityScheduler: Added node 
a2116.smile.com:3 clusterResource: <memory:40960, vCores:40>{color}
 {color:#ff0000}Exception in thread "main" java.lang.RuntimeException: 
java.lang.NullPointerException{color}
 {color:#ff0000}    at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131){color}
 {color:#ff0000}    at 
org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:394){color}
 {color:#ff0000}    at 
org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:246){color}
 {color:#ff0000}    at 
org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:141){color}
 {color:#ff0000}    at 
org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:524){color}
 {color:#ff0000}Caused by: java.lang.NullPointerException{color}
 {color:#ff0000}    at 
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936){color}
 {color:#ff0000}    at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:123){color}
 {color:#ff0000}    ... 4 more{color}

*After waiting some minutes I got the following messages and then nothing :(*

19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: 
Expired:a2115.smile.com:0 Timed out after 600 secs
 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: 
Expired:a2118.smile.com:1 Timed out after 600 secs
 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: 
Expired:a2117.smile.com:2 Timed out after 600 secs
 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: 
Expired:a2116.smile.com:3 Timed out after 600 secs
 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2115.smile.com:0 
as it is now LOST
 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2115.smile.com:0 Node Transitioned 
from RUNNING to LOST
 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2118.smile.com:1 
as it is now LOST
 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2118.smile.com:1 Node Transitioned 
from RUNNING to LOST
 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2117.smile.com:2 
as it is now LOST
 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2117.smile.com:2 Node Transitioned 
from RUNNING to LOST
 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2116.smile.com:3 
as it is now LOST
 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2116.smile.com:3 Node Transitioned 
from RUNNING to LOST
 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node 
a2115.smile.com:0 clusterResource: <memory:30720, vCores:30>
 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node 
a2118.smile.com:1 clusterResource: <memory:20480, vCores:20>
 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node 
a2117.smile.com:2 clusterResource: <memory:10240, vCores:10>
 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node 
a2116.smile.com:3 clusterResource: <memory:0, vCores:0>

*1) I am looking forward to hear from you as I stuck here!*

*2) My second question is that, how I can extend SLS? I mean, where shall I 
write my scheduler code in SLS, run it, and get results? (I need to simulate my 
scheduler and then compare it with other schedulers like FIFO, Fair, and 
Capacity)*

*Thanks a lot,*

*Neda*


was (Author: nedamaleki):
*Dear Wei Yan,*

*I use hadoop 2.4.1. When I want to run SLS, I face with the same problem as :*

RMTokenSecretManager: Rolling master-key for amrm-tokens
19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType 
for class 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler
19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.NodesListManagerEventType for 
class org.apache.hadoop.yarn.server.resourcemanager.NodesListManager
19/05/05 11:52:32 INFO resourcemanager.ResourceManager: Using Scheduler: 
{color:#FF0000}*org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper*{color}
{color:#FF0000}*java.lang.NullPointerException*{color}
{color:#FF0000}    *at 
org.apache.hadoop.yarn.sls.web.SLSWebApp.<init>(SLSWebApp.java:82)*{color}
{color:#FF0000}    *at 
org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.initMetrics(ResourceSchedulerWrapper.java:465)*{color}
{color:#FF0000}    *at 
org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.setConf(ResourceSchedulerWrapper.java:164)*{color}
{color:#FF0000}    *at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)*{color}
{color:#FF0000}    *at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)*{color}
{color:#FF0000}    *at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createScheduler(ResourceManager.java:261)*{color}
{color:#FF0000}    *at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:403)*{color}
{color:#FF0000}    *at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)*{color}
{color:#FF0000}    *at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:824)*{color}
{color:#FF0000}    *at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:226)*{color}
{color:#FF0000}    *at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)*{color}
{color:#FF0000}    *at 
org.apache.hadoop.yarn.sls.SLSRunner.startRM(SLSRunner.java:163)*{color}
{color:#FF0000}    *at 
org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:137)*{color}
{color:#FF0000}    *at 
org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:524)*{color}
19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEventType
 for class 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher
19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppEventType for class 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher
19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEventType
 for class 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher
19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType for class 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher
19/05/05 11:52:32 INFO impl.MetricsConfig: loaded properties from 
hadoop-metrics2.properties
19/05/05 11:52:32 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 
second(s).
19/05/05 11:52:32 INFO impl.MetricsSystemImpl: ResourceManager metrics system 
started
19/05/05 11:52:32 INFO conf.Configuration: found resource 
capacity-scheduler.xml at 
file:/usr/local/hadoop/etc/hadoop/capacity-scheduler.xml
19/05/05 11:52:32 INFO capacity.ParentQueue: root, capacity=1.0, 
asboluteCapacity=1.0, maxCapacity=1.0, asboluteMaxCapacity=1.0, state=RUNNING, 
acls=SUBMIT_APPLICATIONS:*ADMINISTER_QUEUE:*
19/05/05 11:52:32 INFO capacity.ParentQueue: Initialized parent-queue root 
name=root, fullname=root
19/05/05 11:52:32 INFO capacity.LeafQueue: Initializing default
capacity = 1.0 [= (float) configuredCapacity / 100 ]
asboluteCapacity = 1.0 [= parentAbsoluteCapacity * capacity ]
maxCapacity = 1.0 [= configuredMaxCapacity ]
absoluteMaxCapacity = 1.0 [= 1.0 maximumCapacity undefined, 
(parentAbsoluteMaxCapacity * maximumCapacity) / 100 otherwise ]
userLimit = 100 [= configuredUserLimit ]
userLimitFactor = 1.0 [= configuredUserLimitFactor ]
maxApplications = 10000 [= configuredMaximumSystemApplicationsPerQueue or 
(int)(configuredMaximumSystemApplications * absoluteCapacity)]
maxApplicationsPerUser = 10000 [= (int)(maxApplications * (userLimit / 100.0f) 
* userLimitFactor) ]
maxActiveApplications = 1 [= max((int)ceil((clusterResourceMemory / 
minimumAllocation) * maxAMResourcePerQueuePercent * absoluteMaxCapacity),1) ]
maxActiveAppsUsingAbsCap = 1 [= max((int)ceil((clusterResourceMemory / 
minimumAllocation) *maxAMResourcePercent * absoluteCapacity),1) ]
maxActiveApplicationsPerUser = 1 [= max((int)(maxActiveApplications * 
(userLimit / 100.0f) * userLimitFactor),1) ]
usedCapacity = 0.0 [= usedResourcesMemory / (clusterResourceMemory * 
absoluteCapacity)]
absoluteUsedCapacity = 0.0 [= usedResourcesMemory / clusterResourceMemory]
maxAMResourcePerQueuePercent = 0.1 [= configuredMaximumAMResourcePercent ]
minimumAllocationFactor = 0.875 [= (float)(maximumAllocationMemory - 
minimumAllocationMemory) / maximumAllocationMemory ]
numContainers = 0 [= currentNumContainers ]
state = RUNNING [= configuredState ]
acls = SUBMIT_APPLICATIONS:*ADMINISTER_QUEUE:* [= configuredAcls ]
nodeLocalityDelay = 40

19/05/05 11:52:32 INFO capacity.CapacityScheduler: Initialized queue: default: 
capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>, 
usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0
19/05/05 11:52:32 INFO capacity.CapacityScheduler: Initialized queue: root: 
numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, 
vCores:0>usedCapacity=0.0, numApps=0, numContainers=0
19/05/05 11:52:32 INFO capacity.CapacityScheduler: Initialized root queue root: 
numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, 
vCores:0>usedCapacity=0.0, numApps=0, numContainers=0
19/05/05 11:52:32 INFO capacity.CapacityScheduler: Initialized 
CapacityScheduler with calculator=class 
org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
minimumAllocation=<<memory:1024, vCores:1>>, maximumAllocation=<<memory:8192, 
vCores:32>>, asynchronousScheduling=false, asyncScheduleInterval=5ms
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManagerEventType for class 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncherEventType 
for class 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher
19/05/05 11:52:33 INFO resourcemanager.RMNMInfo: Registered RMNMInfo MBean
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for 
class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for 
class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for 
class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for 
class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for 
class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for 
class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for 
class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for 
class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for 
class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for 
class 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
19/05/05 11:52:33 INFO util.HostsFileReader: Refreshing hosts (include/exclude) 
list
19/05/05 11:52:43 INFO resourcemanager.ResourceManager: Transitioning to active 
state
19/05/05 11:52:43 INFO security.RMContainerTokenSecretManager: Rolling 
master-key for container-tokens
19/05/05 11:52:43 INFO security.AMRMTokenSecretManager: Rolling master-key for 
amrm-tokens
19/05/05 11:52:43 INFO security.NMTokenSecretManagerInRM: Rolling master-key 
for nm-tokens
19/05/05 11:52:43 INFO delegation.AbstractDelegationTokenSecretManager: 
Updating the current master key for generating delegation tokens
19/05/05 11:52:43 INFO security.RMDelegationTokenSecretManager: storing master 
key with keyID 1
19/05/05 11:52:43 INFO delegation.AbstractDelegationTokenSecretManager: 
Starting expired delegation token remover thread, tokenRemoverScanInterval=60 
min(s)
19/05/05 11:52:43 INFO delegation.AbstractDelegationTokenSecretManager: 
Updating the current master key for generating delegation tokens
19/05/05 11:52:43 INFO security.RMDelegationTokenSecretManager: storing master 
key with keyID 2
19/05/05 11:52:53 INFO ipc.CallQueueManager: Using callQueue class 
java.util.concurrent.LinkedBlockingQueue
19/05/05 11:52:53 INFO ipc.Server: Starting Socket Reader #1 for port 8031
19/05/05 11:52:53 INFO pb.RpcServerFactoryPBImpl: Adding protocol 
org.apache.hadoop.yarn.server.api.ResourceTrackerPB to the server
19/05/05 11:52:53 INFO ipc.Server: IPC Server Responder: starting
19/05/05 11:52:53 INFO ipc.Server: IPC Server listener on 8031: starting
19/05/05 11:53:03 INFO ipc.CallQueueManager: Using callQueue class 
java.util.concurrent.LinkedBlockingQueue
19/05/05 11:53:03 INFO ipc.Server: Starting Socket Reader #1 for port 8030
19/05/05 11:53:03 INFO pb.RpcServerFactoryPBImpl: Adding protocol 
org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB to the server
19/05/05 11:53:03 INFO ipc.Server: IPC Server listener on 8030: starting
19/05/05 11:53:03 INFO ipc.Server: IPC Server Responder: starting
19/05/05 11:53:13 INFO ipc.CallQueueManager: Using callQueue class 
java.util.concurrent.LinkedBlockingQueue
19/05/05 11:53:13 INFO ipc.Server: Starting Socket Reader #1 for port 8032
19/05/05 11:53:13 INFO pb.RpcServerFactoryPBImpl: Adding protocol 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB to the server
19/05/05 11:53:13 INFO ipc.Server: IPC Server Responder: starting
19/05/05 11:53:13 INFO ipc.Server: IPC Server listener on 8032: starting
19/05/05 11:53:14 INFO resourcemanager.ResourceManager: Transitioned to active 
state
19/05/05 11:53:14 INFO mortbay.log: Logging to 
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
19/05/05 11:53:14 INFO http.HttpRequestLog: Http request log for 
http.requests.resourcemanager is not defined
19/05/05 11:53:14 INFO http.HttpServer2: Added global filter 'safety' 
(class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
19/05/05 11:53:14 INFO http.HttpServer2: Added filter static_user_filter 
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to 
context cluster
19/05/05 11:53:14 INFO http.HttpServer2: Added filter static_user_filter 
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to 
context logs
19/05/05 11:53:14 INFO http.HttpServer2: Added filter static_user_filter 
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to 
context static
19/05/05 11:53:14 INFO http.HttpServer2: adding path spec: /cluster/*
19/05/05 11:53:14 INFO http.HttpServer2: adding path spec: /ws/*
19/05/05 11:53:14 INFO http.HttpServer2: Jetty bound to port 8088
19/05/05 11:53:14 INFO mortbay.log: jetty-6.1.26
19/05/05 11:53:14 INFO mortbay.log: Extract 
jar:file:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.4.1.jar!/webapps/cluster
 to /tmp/Jetty_0_0_0_0_8088_cluster____u0rgz3/webapp
19/05/05 11:53:14 INFO mortbay.log: Started SelectChannelConnector@0.0.0.0:8088
19/05/05 11:53:14 INFO webapp.WebApps: Web app /cluster started at 8088
19/05/05 11:53:14 INFO webapp.WebApps: Registered webapp guice modules
19/05/05 11:53:25 INFO ipc.CallQueueManager: Using callQueue class 
java.util.concurrent.LinkedBlockingQueue
19/05/05 11:53:25 INFO ipc.Server: Starting Socket Reader #1 for port 8033
19/05/05 11:53:25 INFO pb.RpcServerFactoryPBImpl: Adding protocol 
org.apache.hadoop.yarn.server.api.ResourceManagerAdministrationProtocolPB to 
the server
19/05/05 11:53:25 INFO ipc.Server: IPC Server Responder: starting
19/05/05 11:53:25 INFO ipc.Server: IPC Server listener on 8033: starting
19/05/05 11:53:45 INFO util.RackResolver: Resolved a2115.smile.com to 
/default-rack
19/05/05 11:53:45 INFO resourcemanager.ResourceTrackerService: NodeManager from 
node a2115.smile.com(cmPort: 0 httpPort: 80) registered with capability: 
<memory:10240, vCores:10>, assigned nodeId a2115.smile.com:0
19/05/05 11:53:45 INFO rmnode.RMNodeImpl: a2115.smile.com:0 Node Transitioned 
from NEW to RUNNING
19/05/05 11:53:45 INFO capacity.CapacityScheduler: Added node a2115.smile.com:0 
clusterResource: <memory:10240, vCores:10>
19/05/05 11:54:05 INFO util.RackResolver: Resolved a2118.smile.com to 
/default-rack
19/05/05 11:54:05 INFO resourcemanager.ResourceTrackerService: NodeManager from 
node a2118.smile.com(cmPort: 1 httpPort: 80) registered with capability: 
<memory:10240, vCores:10>, assigned nodeId a2118.smile.com:1
19/05/05 11:54:05 INFO rmnode.RMNodeImpl: a2118.smile.com:1 Node Transitioned 
from NEW to RUNNING
19/05/05 11:54:05 INFO capacity.CapacityScheduler: Added node a2118.smile.com:1 
clusterResource: <memory:20480, vCores:20>
19/05/05 11:54:25 INFO util.RackResolver: Resolved a2117.smile.com to 
/default-rack
19/05/05 11:54:25 INFO resourcemanager.ResourceTrackerService: NodeManager from 
node a2117.smile.com(cmPort: 2 httpPort: 80) registered with capability: 
<memory:10240, vCores:10>, assigned nodeId a2117.smile.com:2
19/05/05 11:54:25 INFO rmnode.RMNodeImpl: a2117.smile.com:2 Node Transitioned 
from NEW to RUNNING
19/05/05 11:54:25 INFO capacity.CapacityScheduler: Added node a2117.smile.com:2 
clusterResource: <memory:30720, vCores:30>
19/05/05 11:54:45 INFO util.RackResolver: Resolved a2116.smile.com to 
/default-rack
19/05/05 11:54:45 INFO resourcemanager.ResourceTrackerService: NodeManager from 
node a2116.smile.com(cmPort: 3 httpPort: 80) registered with capability: 
<memory:10240, vCores:10>, assigned nodeId a2116.smile.com:3
19/05/05 11:54:45 INFO rmnode.RMNodeImpl: a2116.smile.com:3 Node Transitioned 
from NEW to RUNNING
{color:#FF0000}19/05/05 11:54:45 INFO capacity.CapacityScheduler: Added node 
a2116.smile.com:3 clusterResource: <memory:40960, vCores:40>{color}
{color:#FF0000}Exception in thread "main" java.lang.RuntimeException: 
java.lang.NullPointerException{color}
{color:#FF0000}    at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131){color}
{color:#FF0000}    at 
org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:394){color}
{color:#FF0000}    at 
org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:246){color}
{color:#FF0000}    at 
org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:141){color}
{color:#FF0000}    at 
org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:524){color}
{color:#FF0000}Caused by: java.lang.NullPointerException{color}
{color:#FF0000}    at 
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936){color}
{color:#FF0000}    at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:123){color}
{color:#FF0000}    ... 4 more{color}

*After waiting some minutes I got the following messages and then nothing :(*

19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: 
Expired:a2115.smile.com:0 Timed out after 600 secs
 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: 
Expired:a2118.smile.com:1 Timed out after 600 secs
 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: 
Expired:a2117.smile.com:2 Timed out after 600 secs
 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: 
Expired:a2116.smile.com:3 Timed out after 600 secs
 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2115.smile.com:0 
as it is now LOST
 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2115.smile.com:0 Node Transitioned 
from RUNNING to LOST
 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2118.smile.com:1 
as it is now LOST
 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2118.smile.com:1 Node Transitioned 
from RUNNING to LOST
 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2117.smile.com:2 
as it is now LOST
 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2117.smile.com:2 Node Transitioned 
from RUNNING to LOST
 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2116.smile.com:3 
as it is now LOST
 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2116.smile.com:3 Node Transitioned 
from RUNNING to LOST
 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node 
a2115.smile.com:0 clusterResource: <memory:30720, vCores:30>
 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node 
a2118.smile.com:1 clusterResource: <memory:20480, vCores:20>
 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node 
a2117.smile.com:2 clusterResource: <memory:10240, vCores:10>
 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node 
a2116.smile.com:3 clusterResource: <memory:0, vCores:0>

*1) I am looking forward to hear from you as I stuck here!*

*2) My second question is that, how I can extend SLS? I mean, where shall I 
write my scheduler code in SLS, run it, and get results? (I need to simulate my 
scheduler and then compare it with other schedulers like FIFO, Fair, and 
Capacity)*

*Thanks a lot,*

*Neda*

> Yarn Scheduler Load Simulator
> -----------------------------
>
>                 Key: YARN-1021
>                 URL: https://issues.apache.org/jira/browse/YARN-1021
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: scheduler
>            Reporter: Wei Yan
>            Assignee: Wei Yan
>            Priority: Major
>             Fix For: 2.3.0
>
>         Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, 
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf
>
>
> The Yarn Scheduler is a fertile area of interest with different 
> implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, 
> several optimizations are also made to improve scheduler performance for 
> different scenarios and workload. Each scheduler algorithm has its own set of 
> features, and drives scheduling decisions by many factors, such as fairness, 
> capacity guarantee, resource availability, etc. It is very important to 
> evaluate a scheduler algorithm very well before we deploy it in a production 
> cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling 
> algorithm. Evaluating in a real cluster is always time and cost consuming, 
> and it is also very hard to find a large-enough cluster. Hence, a simulator 
> which can predict how well a scheduler algorithm for some specific workload 
> would be quite useful.
> We want to build a Scheduler Load Simulator to simulate large-scale Yarn 
> clusters and application loads in a single machine. This would be invaluable 
> in furthering Yarn by providing a tool for researchers and developers to 
> prototype new scheduler features and predict their behavior and performance 
> with reasonable amount of confidence, there-by aiding rapid innovation.
> The simulator will exercise the real Yarn ResourceManager removing the 
> network factor by simulating NodeManagers and ApplicationMasters via handling 
> and dispatching NM/AMs heartbeat events from within the same JVM.
> To keep tracking of scheduler behavior and performance, a scheduler wrapper 
> will wrap the real scheduler.
> The simulator will produce real time metrics while executing, including:
> * Resource usages for whole cluster and each queue, which can be utilized to 
> configure cluster and queue's capacity.
> * The detailed application execution trace (recorded in relation to simulated 
> time), which can be analyzed to understand/validate the  scheduler behavior 
> (individual jobs turn around time, throughput, fairness, capacity guarantee, 
> etc).
> * Several key metrics of scheduler algorithm, such as time cost of each 
> scheduler operation (allocate, handle, etc), which can be utilized by Hadoop 
> developers to find the code spots and scalability limits.
> The simulator will provide real time charts showing the behavior of the 
> scheduler and its performance.
> A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing 
> how to use simulator to simulate Fair Scheduler and Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to