[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833259#comment-16833259 ]
NedaMaleki edited comment on YARN-1021 at 5/5/19 8:12 AM: ---------------------------------------------------------- *Dear Wei Yan,* *I use hadoop 2.4.1. When I want to run SLS, I face with the same problem as :* /usr/local/hadoop/share/hadoop/tools/sls/bin/slsrun.sh --input-rumen=/usr/local/hadoop/share/hadoop/tools/sls/sample-data/2jobs2min-rumen-jh.json --output-dir=/usr/local/hadoop/share/hadoop/tools/sls/output 19/05/05 11:52:31 INFO conf.Configuration: found resource core-site.xml at file:/usr/local/hadoop/etc/hadoop/core-site.xml 19/05/05 11:52:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 19/05/05 11:52:31 INFO security.Groups: clearing userToGroupsMap cache 19/05/05 11:52:31 INFO conf.Configuration: found resource yarn-site.xml at file:/usr/local/hadoop/etc/hadoop/yarn-site.xml 19/05/05 11:52:31 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.RMFatalEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMFatalEventDispatcher 19/05/05 11:52:32 INFO security.NMTokenSecretManagerInRM: NMTokenKeyRollingInterval: 86400000ms and NMTokenKeyActivationDelay: 900000ms 19/05/05 11:52:32 INFO security.RMContainerTokenSecretManager: ContainerTokenKeyRollingInterval: 86400000ms and ContainerTokenKeyActivationDelay: 900000ms 19/05/05 11:52:32 INFO security.AMRMTokenSecretManager: Rolling master-key for amrm-tokens 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType for class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType for class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.NodesListManagerEventType for class org.apache.hadoop.yarn.server.resourcemanager.NodesListManager 19/05/05 11:52:32 INFO resourcemanager.ResourceManager: Using Scheduler: {color:#ff0000}*org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper*{color} {color:#ff0000}*java.lang.NullPointerException*{color} {color:#ff0000} *at org.apache.hadoop.yarn.sls.web.SLSWebApp.<init>(SLSWebApp.java:82)*{color} {color:#ff0000} *at org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.initMetrics(ResourceSchedulerWrapper.java:465)*{color} {color:#ff0000} *at org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.setConf(ResourceSchedulerWrapper.java:164)*{color} {color:#ff0000} *at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)*{color} {color:#ff0000} *at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)*{color} {color:#ff0000} *at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createScheduler(ResourceManager.java:261)*{color} {color:#ff0000} *at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:403)*{color} {color:#ff0000} *at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)*{color} {color:#ff0000} *at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:824)*{color} {color:#ff0000} *at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:226)*{color} {color:#ff0000} *at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)*{color} {color:#ff0000} *at org.apache.hadoop.yarn.sls.SLSRunner.startRM(SLSRunner.java:163)*{color} {color:#ff0000} *at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:137)*{color} {color:#ff0000} *at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:524)*{color} 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher 19/05/05 11:52:32 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 19/05/05 11:52:32 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 19/05/05 11:52:32 INFO impl.MetricsSystemImpl: ResourceManager metrics system started 19/05/05 11:52:32 INFO conf.Configuration: found resource capacity-scheduler.xml at [file:/usr/local/hadoop/etc/hadoop/capacity-scheduler.xml|file:///usr/local/hadoop/etc/hadoop/capacity-scheduler.xml] 19/05/05 11:52:32 INFO capacity.ParentQueue: root, capacity=1.0, asboluteCapacity=1.0, maxCapacity=1.0, asboluteMaxCapacity=1.0, state=RUNNING, acls=SUBMIT_APPLICATIONS:*ADMINISTER_QUEUE:* 19/05/05 11:52:32 INFO capacity.ParentQueue: Initialized parent-queue root name=root, fullname=root 19/05/05 11:52:32 INFO capacity.LeafQueue: Initializing default capacity = 1.0 [= (float) configuredCapacity / 100 ] asboluteCapacity = 1.0 [= parentAbsoluteCapacity * capacity ] maxCapacity = 1.0 [= configuredMaxCapacity ] absoluteMaxCapacity = 1.0 [= 1.0 maximumCapacity undefined, (parentAbsoluteMaxCapacity * maximumCapacity) / 100 otherwise ] userLimit = 100 [= configuredUserLimit ] userLimitFactor = 1.0 [= configuredUserLimitFactor ] maxApplications = 10000 [= configuredMaximumSystemApplicationsPerQueue or (int)(configuredMaximumSystemApplications * absoluteCapacity)] maxApplicationsPerUser = 10000 [= (int)(maxApplications * (userLimit / 100.0f) * userLimitFactor) ] maxActiveApplications = 1 [= max((int)ceil((clusterResourceMemory / minimumAllocation) * maxAMResourcePerQueuePercent * absoluteMaxCapacity),1) ] maxActiveAppsUsingAbsCap = 1 [= max((int)ceil((clusterResourceMemory / minimumAllocation) *maxAMResourcePercent * absoluteCapacity),1) ] maxActiveApplicationsPerUser = 1 [= max((int)(maxActiveApplications * (userLimit / 100.0f) * userLimitFactor),1) ] usedCapacity = 0.0 [= usedResourcesMemory / (clusterResourceMemory * absoluteCapacity)] absoluteUsedCapacity = 0.0 [= usedResourcesMemory / clusterResourceMemory] maxAMResourcePerQueuePercent = 0.1 [= configuredMaximumAMResourcePercent ] minimumAllocationFactor = 0.875 [= (float)(maximumAllocationMemory - minimumAllocationMemory) / maximumAllocationMemory ] numContainers = 0 [= currentNumContainers ] state = RUNNING [= configuredState ] acls = SUBMIT_APPLICATIONS:*ADMINISTER_QUEUE:* [= configuredAcls ] nodeLocalityDelay = 40 19/05/05 11:52:32 INFO capacity.CapacityScheduler: Initialized queue: default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0 19/05/05 11:52:32 INFO capacity.CapacityScheduler: Initialized queue: root: numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>usedCapacity=0.0, numApps=0, numContainers=0 19/05/05 11:52:32 INFO capacity.CapacityScheduler: Initialized root queue root: numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>usedCapacity=0.0, numApps=0, numContainers=0 19/05/05 11:52:32 INFO capacity.CapacityScheduler: Initialized CapacityScheduler with calculator=class org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, minimumAllocation=<<memory:1024, vCores:1>>, maximumAllocation=<<memory:8192, vCores:32>>, asynchronousScheduling=false, asyncScheduleInterval=5ms 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.RMAppManagerEventType for class org.apache.hadoop.yarn.server.resourcemanager.RMAppManager 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncherEventType for class org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher 19/05/05 11:52:33 INFO resourcemanager.RMNMInfo: Registered RMNMInfo MBean 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for class org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for class org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for class org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for class org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for class org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for class org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for class org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for class org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for class org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for class org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler 19/05/05 11:52:33 INFO util.HostsFileReader: Refreshing hosts (include/exclude) list 19/05/05 11:52:43 INFO resourcemanager.ResourceManager: Transitioning to active state 19/05/05 11:52:43 INFO security.RMContainerTokenSecretManager: Rolling master-key for container-tokens 19/05/05 11:52:43 INFO security.AMRMTokenSecretManager: Rolling master-key for amrm-tokens 19/05/05 11:52:43 INFO security.NMTokenSecretManagerInRM: Rolling master-key for nm-tokens 19/05/05 11:52:43 INFO delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens 19/05/05 11:52:43 INFO security.RMDelegationTokenSecretManager: storing master key with keyID 1 19/05/05 11:52:43 INFO delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s) 19/05/05 11:52:43 INFO delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens 19/05/05 11:52:43 INFO security.RMDelegationTokenSecretManager: storing master key with keyID 2 19/05/05 11:52:53 INFO ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue 19/05/05 11:52:53 INFO ipc.Server: Starting Socket Reader #1 for port 8031 19/05/05 11:52:53 INFO pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.api.ResourceTrackerPB to the server 19/05/05 11:52:53 INFO ipc.Server: IPC Server Responder: starting 19/05/05 11:52:53 INFO ipc.Server: IPC Server listener on 8031: starting 19/05/05 11:53:03 INFO ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue 19/05/05 11:53:03 INFO ipc.Server: Starting Socket Reader #1 for port 8030 19/05/05 11:53:03 INFO pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB to the server 19/05/05 11:53:03 INFO ipc.Server: IPC Server listener on 8030: starting 19/05/05 11:53:03 INFO ipc.Server: IPC Server Responder: starting 19/05/05 11:53:13 INFO ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue 19/05/05 11:53:13 INFO ipc.Server: Starting Socket Reader #1 for port 8032 19/05/05 11:53:13 INFO pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.api.ApplicationClientProtocolPB to the server 19/05/05 11:53:13 INFO ipc.Server: IPC Server Responder: starting 19/05/05 11:53:13 INFO ipc.Server: IPC Server listener on 8032: starting 19/05/05 11:53:14 INFO resourcemanager.ResourceManager: Transitioned to active state 19/05/05 11:53:14 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 19/05/05 11:53:14 INFO http.HttpRequestLog: Http request log for http.requests.resourcemanager is not defined 19/05/05 11:53:14 INFO http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter) 19/05/05 11:53:14 INFO http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context cluster 19/05/05 11:53:14 INFO http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs 19/05/05 11:53:14 INFO http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static 19/05/05 11:53:14 INFO http.HttpServer2: adding path spec: /cluster/* 19/05/05 11:53:14 INFO http.HttpServer2: adding path spec: /ws/* 19/05/05 11:53:14 INFO http.HttpServer2: Jetty bound to port 8088 19/05/05 11:53:14 INFO mortbay.log: jetty-6.1.26 19/05/05 11:53:14 INFO mortbay.log: Extract jar:[file:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.4.1.jar!/webapps/cluster|file:///usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.4.1.jar!/webapps/cluster] to /tmp/Jetty_0_0_0_0_8088_cluster____u0rgz3/webapp 19/05/05 11:53:14 INFO mortbay.log: Started SelectChannelConnector@0.0.0.0:8088 19/05/05 11:53:14 INFO webapp.WebApps: Web app /cluster started at 8088 19/05/05 11:53:14 INFO webapp.WebApps: Registered webapp guice modules 19/05/05 11:53:25 INFO ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue 19/05/05 11:53:25 INFO ipc.Server: Starting Socket Reader #1 for port 8033 19/05/05 11:53:25 INFO pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.api.ResourceManagerAdministrationProtocolPB to the server 19/05/05 11:53:25 INFO ipc.Server: IPC Server Responder: starting 19/05/05 11:53:25 INFO ipc.Server: IPC Server listener on 8033: starting 19/05/05 11:53:45 INFO util.RackResolver: Resolved a2115.smile.com to /default-rack 19/05/05 11:53:45 INFO resourcemanager.ResourceTrackerService: NodeManager from node a2115.smile.com(cmPort: 0 httpPort: 80) registered with capability: <memory:10240, vCores:10>, assigned nodeId a2115.smile.com:0 19/05/05 11:53:45 INFO rmnode.RMNodeImpl: a2115.smile.com:0 Node Transitioned from NEW to RUNNING 19/05/05 11:53:45 INFO capacity.CapacityScheduler: Added node a2115.smile.com:0 clusterResource: <memory:10240, vCores:10> 19/05/05 11:54:05 INFO util.RackResolver: Resolved a2118.smile.com to /default-rack 19/05/05 11:54:05 INFO resourcemanager.ResourceTrackerService: NodeManager from node a2118.smile.com(cmPort: 1 httpPort: 80) registered with capability: <memory:10240, vCores:10>, assigned nodeId a2118.smile.com:1 19/05/05 11:54:05 INFO rmnode.RMNodeImpl: a2118.smile.com:1 Node Transitioned from NEW to RUNNING 19/05/05 11:54:05 INFO capacity.CapacityScheduler: Added node a2118.smile.com:1 clusterResource: <memory:20480, vCores:20> 19/05/05 11:54:25 INFO util.RackResolver: Resolved a2117.smile.com to /default-rack 19/05/05 11:54:25 INFO resourcemanager.ResourceTrackerService: NodeManager from node a2117.smile.com(cmPort: 2 httpPort: 80) registered with capability: <memory:10240, vCores:10>, assigned nodeId a2117.smile.com:2 19/05/05 11:54:25 INFO rmnode.RMNodeImpl: a2117.smile.com:2 Node Transitioned from NEW to RUNNING 19/05/05 11:54:25 INFO capacity.CapacityScheduler: Added node a2117.smile.com:2 clusterResource: <memory:30720, vCores:30> 19/05/05 11:54:45 INFO util.RackResolver: Resolved a2116.smile.com to /default-rack 19/05/05 11:54:45 INFO resourcemanager.ResourceTrackerService: NodeManager from node a2116.smile.com(cmPort: 3 httpPort: 80) registered with capability: <memory:10240, vCores:10>, assigned nodeId a2116.smile.com:3 19/05/05 11:54:45 INFO rmnode.RMNodeImpl: a2116.smile.com:3 Node Transitioned from NEW to RUNNING {color:#ff0000}19/05/05 11:54:45 INFO capacity.CapacityScheduler: Added node a2116.smile.com:3 clusterResource: <memory:40960, vCores:40>{color} {color:#ff0000}Exception in thread "main" java.lang.RuntimeException: java.lang.NullPointerException{color} {color:#ff0000} at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131){color} {color:#ff0000} at org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:394){color} {color:#ff0000} at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:246){color} {color:#ff0000} at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:141){color} {color:#ff0000} at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:524){color} {color:#ff0000}Caused by: java.lang.NullPointerException{color} {color:#ff0000} at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936){color} {color:#ff0000} at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:123){color} {color:#ff0000} ... 4 more{color} *After waiting some minutes I got the following messages and then nothing :(* 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2115.smile.com:0 Timed out after 600 secs 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2118.smile.com:1 Timed out after 600 secs 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2117.smile.com:2 Timed out after 600 secs 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2116.smile.com:3 Timed out after 600 secs 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2115.smile.com:0 as it is now LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2115.smile.com:0 Node Transitioned from RUNNING to LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2118.smile.com:1 as it is now LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2118.smile.com:1 Node Transitioned from RUNNING to LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2117.smile.com:2 as it is now LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2117.smile.com:2 Node Transitioned from RUNNING to LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2116.smile.com:3 as it is now LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2116.smile.com:3 Node Transitioned from RUNNING to LOST 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node a2115.smile.com:0 clusterResource: <memory:30720, vCores:30> 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node a2118.smile.com:1 clusterResource: <memory:20480, vCores:20> 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node a2117.smile.com:2 clusterResource: <memory:10240, vCores:10> 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node a2116.smile.com:3 clusterResource: <memory:0, vCores:0> *1) I am looking forward to hear from you as I stuck here!* *2) My second question is that, how I can extend SLS? I mean, where shall I write my scheduler code in SLS, run it, and get results? (I need to simulate my scheduler and then compare it with other schedulers like FIFO, Fair, and Capacity)* *Thanks a lot,* *Neda* was (Author: nedamaleki): *Dear Wei Yan,* *I use hadoop 2.4.1. When I want to run SLS, I face with the same problem as :* RMTokenSecretManager: Rolling master-key for amrm-tokens 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType for class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.NodesListManagerEventType for class org.apache.hadoop.yarn.server.resourcemanager.NodesListManager 19/05/05 11:52:32 INFO resourcemanager.ResourceManager: Using Scheduler: {color:#FF0000}*org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper*{color} {color:#FF0000}*java.lang.NullPointerException*{color} {color:#FF0000} *at org.apache.hadoop.yarn.sls.web.SLSWebApp.<init>(SLSWebApp.java:82)*{color} {color:#FF0000} *at org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.initMetrics(ResourceSchedulerWrapper.java:465)*{color} {color:#FF0000} *at org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.setConf(ResourceSchedulerWrapper.java:164)*{color} {color:#FF0000} *at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)*{color} {color:#FF0000} *at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)*{color} {color:#FF0000} *at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createScheduler(ResourceManager.java:261)*{color} {color:#FF0000} *at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:403)*{color} {color:#FF0000} *at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)*{color} {color:#FF0000} *at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:824)*{color} {color:#FF0000} *at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:226)*{color} {color:#FF0000} *at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)*{color} {color:#FF0000} *at org.apache.hadoop.yarn.sls.SLSRunner.startRM(SLSRunner.java:163)*{color} {color:#FF0000} *at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:137)*{color} {color:#FF0000} *at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:524)*{color} 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher 19/05/05 11:52:32 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 19/05/05 11:52:32 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 19/05/05 11:52:32 INFO impl.MetricsSystemImpl: ResourceManager metrics system started 19/05/05 11:52:32 INFO conf.Configuration: found resource capacity-scheduler.xml at file:/usr/local/hadoop/etc/hadoop/capacity-scheduler.xml 19/05/05 11:52:32 INFO capacity.ParentQueue: root, capacity=1.0, asboluteCapacity=1.0, maxCapacity=1.0, asboluteMaxCapacity=1.0, state=RUNNING, acls=SUBMIT_APPLICATIONS:*ADMINISTER_QUEUE:* 19/05/05 11:52:32 INFO capacity.ParentQueue: Initialized parent-queue root name=root, fullname=root 19/05/05 11:52:32 INFO capacity.LeafQueue: Initializing default capacity = 1.0 [= (float) configuredCapacity / 100 ] asboluteCapacity = 1.0 [= parentAbsoluteCapacity * capacity ] maxCapacity = 1.0 [= configuredMaxCapacity ] absoluteMaxCapacity = 1.0 [= 1.0 maximumCapacity undefined, (parentAbsoluteMaxCapacity * maximumCapacity) / 100 otherwise ] userLimit = 100 [= configuredUserLimit ] userLimitFactor = 1.0 [= configuredUserLimitFactor ] maxApplications = 10000 [= configuredMaximumSystemApplicationsPerQueue or (int)(configuredMaximumSystemApplications * absoluteCapacity)] maxApplicationsPerUser = 10000 [= (int)(maxApplications * (userLimit / 100.0f) * userLimitFactor) ] maxActiveApplications = 1 [= max((int)ceil((clusterResourceMemory / minimumAllocation) * maxAMResourcePerQueuePercent * absoluteMaxCapacity),1) ] maxActiveAppsUsingAbsCap = 1 [= max((int)ceil((clusterResourceMemory / minimumAllocation) *maxAMResourcePercent * absoluteCapacity),1) ] maxActiveApplicationsPerUser = 1 [= max((int)(maxActiveApplications * (userLimit / 100.0f) * userLimitFactor),1) ] usedCapacity = 0.0 [= usedResourcesMemory / (clusterResourceMemory * absoluteCapacity)] absoluteUsedCapacity = 0.0 [= usedResourcesMemory / clusterResourceMemory] maxAMResourcePerQueuePercent = 0.1 [= configuredMaximumAMResourcePercent ] minimumAllocationFactor = 0.875 [= (float)(maximumAllocationMemory - minimumAllocationMemory) / maximumAllocationMemory ] numContainers = 0 [= currentNumContainers ] state = RUNNING [= configuredState ] acls = SUBMIT_APPLICATIONS:*ADMINISTER_QUEUE:* [= configuredAcls ] nodeLocalityDelay = 40 19/05/05 11:52:32 INFO capacity.CapacityScheduler: Initialized queue: default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0 19/05/05 11:52:32 INFO capacity.CapacityScheduler: Initialized queue: root: numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>usedCapacity=0.0, numApps=0, numContainers=0 19/05/05 11:52:32 INFO capacity.CapacityScheduler: Initialized root queue root: numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>usedCapacity=0.0, numApps=0, numContainers=0 19/05/05 11:52:32 INFO capacity.CapacityScheduler: Initialized CapacityScheduler with calculator=class org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, minimumAllocation=<<memory:1024, vCores:1>>, maximumAllocation=<<memory:8192, vCores:32>>, asynchronousScheduling=false, asyncScheduleInterval=5ms 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.RMAppManagerEventType for class org.apache.hadoop.yarn.server.resourcemanager.RMAppManager 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncherEventType for class org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher 19/05/05 11:52:33 INFO resourcemanager.RMNMInfo: Registered RMNMInfo MBean 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for class org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for class org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for class org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for class org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for class org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for class org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for class org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for class org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for class org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler 19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for class org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler 19/05/05 11:52:33 INFO util.HostsFileReader: Refreshing hosts (include/exclude) list 19/05/05 11:52:43 INFO resourcemanager.ResourceManager: Transitioning to active state 19/05/05 11:52:43 INFO security.RMContainerTokenSecretManager: Rolling master-key for container-tokens 19/05/05 11:52:43 INFO security.AMRMTokenSecretManager: Rolling master-key for amrm-tokens 19/05/05 11:52:43 INFO security.NMTokenSecretManagerInRM: Rolling master-key for nm-tokens 19/05/05 11:52:43 INFO delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens 19/05/05 11:52:43 INFO security.RMDelegationTokenSecretManager: storing master key with keyID 1 19/05/05 11:52:43 INFO delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s) 19/05/05 11:52:43 INFO delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens 19/05/05 11:52:43 INFO security.RMDelegationTokenSecretManager: storing master key with keyID 2 19/05/05 11:52:53 INFO ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue 19/05/05 11:52:53 INFO ipc.Server: Starting Socket Reader #1 for port 8031 19/05/05 11:52:53 INFO pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.api.ResourceTrackerPB to the server 19/05/05 11:52:53 INFO ipc.Server: IPC Server Responder: starting 19/05/05 11:52:53 INFO ipc.Server: IPC Server listener on 8031: starting 19/05/05 11:53:03 INFO ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue 19/05/05 11:53:03 INFO ipc.Server: Starting Socket Reader #1 for port 8030 19/05/05 11:53:03 INFO pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB to the server 19/05/05 11:53:03 INFO ipc.Server: IPC Server listener on 8030: starting 19/05/05 11:53:03 INFO ipc.Server: IPC Server Responder: starting 19/05/05 11:53:13 INFO ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue 19/05/05 11:53:13 INFO ipc.Server: Starting Socket Reader #1 for port 8032 19/05/05 11:53:13 INFO pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.api.ApplicationClientProtocolPB to the server 19/05/05 11:53:13 INFO ipc.Server: IPC Server Responder: starting 19/05/05 11:53:13 INFO ipc.Server: IPC Server listener on 8032: starting 19/05/05 11:53:14 INFO resourcemanager.ResourceManager: Transitioned to active state 19/05/05 11:53:14 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 19/05/05 11:53:14 INFO http.HttpRequestLog: Http request log for http.requests.resourcemanager is not defined 19/05/05 11:53:14 INFO http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter) 19/05/05 11:53:14 INFO http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context cluster 19/05/05 11:53:14 INFO http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs 19/05/05 11:53:14 INFO http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static 19/05/05 11:53:14 INFO http.HttpServer2: adding path spec: /cluster/* 19/05/05 11:53:14 INFO http.HttpServer2: adding path spec: /ws/* 19/05/05 11:53:14 INFO http.HttpServer2: Jetty bound to port 8088 19/05/05 11:53:14 INFO mortbay.log: jetty-6.1.26 19/05/05 11:53:14 INFO mortbay.log: Extract jar:file:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.4.1.jar!/webapps/cluster to /tmp/Jetty_0_0_0_0_8088_cluster____u0rgz3/webapp 19/05/05 11:53:14 INFO mortbay.log: Started SelectChannelConnector@0.0.0.0:8088 19/05/05 11:53:14 INFO webapp.WebApps: Web app /cluster started at 8088 19/05/05 11:53:14 INFO webapp.WebApps: Registered webapp guice modules 19/05/05 11:53:25 INFO ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue 19/05/05 11:53:25 INFO ipc.Server: Starting Socket Reader #1 for port 8033 19/05/05 11:53:25 INFO pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.api.ResourceManagerAdministrationProtocolPB to the server 19/05/05 11:53:25 INFO ipc.Server: IPC Server Responder: starting 19/05/05 11:53:25 INFO ipc.Server: IPC Server listener on 8033: starting 19/05/05 11:53:45 INFO util.RackResolver: Resolved a2115.smile.com to /default-rack 19/05/05 11:53:45 INFO resourcemanager.ResourceTrackerService: NodeManager from node a2115.smile.com(cmPort: 0 httpPort: 80) registered with capability: <memory:10240, vCores:10>, assigned nodeId a2115.smile.com:0 19/05/05 11:53:45 INFO rmnode.RMNodeImpl: a2115.smile.com:0 Node Transitioned from NEW to RUNNING 19/05/05 11:53:45 INFO capacity.CapacityScheduler: Added node a2115.smile.com:0 clusterResource: <memory:10240, vCores:10> 19/05/05 11:54:05 INFO util.RackResolver: Resolved a2118.smile.com to /default-rack 19/05/05 11:54:05 INFO resourcemanager.ResourceTrackerService: NodeManager from node a2118.smile.com(cmPort: 1 httpPort: 80) registered with capability: <memory:10240, vCores:10>, assigned nodeId a2118.smile.com:1 19/05/05 11:54:05 INFO rmnode.RMNodeImpl: a2118.smile.com:1 Node Transitioned from NEW to RUNNING 19/05/05 11:54:05 INFO capacity.CapacityScheduler: Added node a2118.smile.com:1 clusterResource: <memory:20480, vCores:20> 19/05/05 11:54:25 INFO util.RackResolver: Resolved a2117.smile.com to /default-rack 19/05/05 11:54:25 INFO resourcemanager.ResourceTrackerService: NodeManager from node a2117.smile.com(cmPort: 2 httpPort: 80) registered with capability: <memory:10240, vCores:10>, assigned nodeId a2117.smile.com:2 19/05/05 11:54:25 INFO rmnode.RMNodeImpl: a2117.smile.com:2 Node Transitioned from NEW to RUNNING 19/05/05 11:54:25 INFO capacity.CapacityScheduler: Added node a2117.smile.com:2 clusterResource: <memory:30720, vCores:30> 19/05/05 11:54:45 INFO util.RackResolver: Resolved a2116.smile.com to /default-rack 19/05/05 11:54:45 INFO resourcemanager.ResourceTrackerService: NodeManager from node a2116.smile.com(cmPort: 3 httpPort: 80) registered with capability: <memory:10240, vCores:10>, assigned nodeId a2116.smile.com:3 19/05/05 11:54:45 INFO rmnode.RMNodeImpl: a2116.smile.com:3 Node Transitioned from NEW to RUNNING {color:#FF0000}19/05/05 11:54:45 INFO capacity.CapacityScheduler: Added node a2116.smile.com:3 clusterResource: <memory:40960, vCores:40>{color} {color:#FF0000}Exception in thread "main" java.lang.RuntimeException: java.lang.NullPointerException{color} {color:#FF0000} at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131){color} {color:#FF0000} at org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:394){color} {color:#FF0000} at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:246){color} {color:#FF0000} at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:141){color} {color:#FF0000} at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:524){color} {color:#FF0000}Caused by: java.lang.NullPointerException{color} {color:#FF0000} at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936){color} {color:#FF0000} at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:123){color} {color:#FF0000} ... 4 more{color} *After waiting some minutes I got the following messages and then nothing :(* 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2115.smile.com:0 Timed out after 600 secs 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2118.smile.com:1 Timed out after 600 secs 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2117.smile.com:2 Timed out after 600 secs 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2116.smile.com:3 Timed out after 600 secs 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2115.smile.com:0 as it is now LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2115.smile.com:0 Node Transitioned from RUNNING to LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2118.smile.com:1 as it is now LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2118.smile.com:1 Node Transitioned from RUNNING to LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2117.smile.com:2 as it is now LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2117.smile.com:2 Node Transitioned from RUNNING to LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2116.smile.com:3 as it is now LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2116.smile.com:3 Node Transitioned from RUNNING to LOST 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node a2115.smile.com:0 clusterResource: <memory:30720, vCores:30> 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node a2118.smile.com:1 clusterResource: <memory:20480, vCores:20> 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node a2117.smile.com:2 clusterResource: <memory:10240, vCores:10> 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node a2116.smile.com:3 clusterResource: <memory:0, vCores:0> *1) I am looking forward to hear from you as I stuck here!* *2) My second question is that, how I can extend SLS? I mean, where shall I write my scheduler code in SLS, run it, and get results? (I need to simulate my scheduler and then compare it with other schedulers like FIFO, Fair, and Capacity)* *Thanks a lot,* *Neda* > Yarn Scheduler Load Simulator > ----------------------------- > > Key: YARN-1021 > URL: https://issues.apache.org/jira/browse/YARN-1021 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler > Reporter: Wei Yan > Assignee: Wei Yan > Priority: Major > Fix For: 2.3.0 > > Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf > > > The Yarn Scheduler is a fertile area of interest with different > implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, > several optimizations are also made to improve scheduler performance for > different scenarios and workload. Each scheduler algorithm has its own set of > features, and drives scheduling decisions by many factors, such as fairness, > capacity guarantee, resource availability, etc. It is very important to > evaluate a scheduler algorithm very well before we deploy it in a production > cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling > algorithm. Evaluating in a real cluster is always time and cost consuming, > and it is also very hard to find a large-enough cluster. Hence, a simulator > which can predict how well a scheduler algorithm for some specific workload > would be quite useful. > We want to build a Scheduler Load Simulator to simulate large-scale Yarn > clusters and application loads in a single machine. This would be invaluable > in furthering Yarn by providing a tool for researchers and developers to > prototype new scheduler features and predict their behavior and performance > with reasonable amount of confidence, there-by aiding rapid innovation. > The simulator will exercise the real Yarn ResourceManager removing the > network factor by simulating NodeManagers and ApplicationMasters via handling > and dispatching NM/AMs heartbeat events from within the same JVM. > To keep tracking of scheduler behavior and performance, a scheduler wrapper > will wrap the real scheduler. > The simulator will produce real time metrics while executing, including: > * Resource usages for whole cluster and each queue, which can be utilized to > configure cluster and queue's capacity. > * The detailed application execution trace (recorded in relation to simulated > time), which can be analyzed to understand/validate the scheduler behavior > (individual jobs turn around time, throughput, fairness, capacity guarantee, > etc). > * Several key metrics of scheduler algorithm, such as time cost of each > scheduler operation (allocate, handle, etc), which can be utilized by Hadoop > developers to find the code spots and scalability limits. > The simulator will provide real time charts showing the behavior of the > scheduler and its performance. > A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing > how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org