[
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833259#comment-16833259
]
NedaMaleki edited comment on YARN-1021 at 5/5/19 8:10 AM:
----------------------------------------------------------
*Dear Wei Yan,*
*I use hadoop 2.4.1. When I want to run SLS, I face with the same problem as :*
RMTokenSecretManager: Rolling master-key for amrm-tokens
19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType
for class
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler
19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.resourcemanager.NodesListManagerEventType for
class org.apache.hadoop.yarn.server.resourcemanager.NodesListManager
19/05/05 11:52:32 INFO resourcemanager.ResourceManager: Using Scheduler:
{color:#FF0000}*org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper*{color}
{color:#FF0000}*java.lang.NullPointerException*{color}
{color:#FF0000} *at
org.apache.hadoop.yarn.sls.web.SLSWebApp.<init>(SLSWebApp.java:82)*{color}
{color:#FF0000} *at
org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.initMetrics(ResourceSchedulerWrapper.java:465)*{color}
{color:#FF0000} *at
org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.setConf(ResourceSchedulerWrapper.java:164)*{color}
{color:#FF0000} *at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)*{color}
{color:#FF0000} *at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)*{color}
{color:#FF0000} *at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createScheduler(ResourceManager.java:261)*{color}
{color:#FF0000} *at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:403)*{color}
{color:#FF0000} *at
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)*{color}
{color:#FF0000} *at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:824)*{color}
{color:#FF0000} *at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:226)*{color}
{color:#FF0000} *at
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)*{color}
{color:#FF0000} *at
org.apache.hadoop.yarn.sls.SLSRunner.startRM(SLSRunner.java:163)*{color}
{color:#FF0000} *at
org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:137)*{color}
{color:#FF0000} *at
org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:524)*{color}
19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEventType
for class
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher
19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppEventType for class
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher
19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEventType
for class
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher
19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType for class
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher
19/05/05 11:52:32 INFO impl.MetricsConfig: loaded properties from
hadoop-metrics2.properties
19/05/05 11:52:32 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10
second(s).
19/05/05 11:52:32 INFO impl.MetricsSystemImpl: ResourceManager metrics system
started
19/05/05 11:52:32 INFO conf.Configuration: found resource
capacity-scheduler.xml at
file:/usr/local/hadoop/etc/hadoop/capacity-scheduler.xml
19/05/05 11:52:32 INFO capacity.ParentQueue: root, capacity=1.0,
asboluteCapacity=1.0, maxCapacity=1.0, asboluteMaxCapacity=1.0, state=RUNNING,
acls=SUBMIT_APPLICATIONS:*ADMINISTER_QUEUE:*
19/05/05 11:52:32 INFO capacity.ParentQueue: Initialized parent-queue root
name=root, fullname=root
19/05/05 11:52:32 INFO capacity.LeafQueue: Initializing default
capacity = 1.0 [= (float) configuredCapacity / 100 ]
asboluteCapacity = 1.0 [= parentAbsoluteCapacity * capacity ]
maxCapacity = 1.0 [= configuredMaxCapacity ]
absoluteMaxCapacity = 1.0 [= 1.0 maximumCapacity undefined,
(parentAbsoluteMaxCapacity * maximumCapacity) / 100 otherwise ]
userLimit = 100 [= configuredUserLimit ]
userLimitFactor = 1.0 [= configuredUserLimitFactor ]
maxApplications = 10000 [= configuredMaximumSystemApplicationsPerQueue or
(int)(configuredMaximumSystemApplications * absoluteCapacity)]
maxApplicationsPerUser = 10000 [= (int)(maxApplications * (userLimit / 100.0f)
* userLimitFactor) ]
maxActiveApplications = 1 [= max((int)ceil((clusterResourceMemory /
minimumAllocation) * maxAMResourcePerQueuePercent * absoluteMaxCapacity),1) ]
maxActiveAppsUsingAbsCap = 1 [= max((int)ceil((clusterResourceMemory /
minimumAllocation) *maxAMResourcePercent * absoluteCapacity),1) ]
maxActiveApplicationsPerUser = 1 [= max((int)(maxActiveApplications *
(userLimit / 100.0f) * userLimitFactor),1) ]
usedCapacity = 0.0 [= usedResourcesMemory / (clusterResourceMemory *
absoluteCapacity)]
absoluteUsedCapacity = 0.0 [= usedResourcesMemory / clusterResourceMemory]
maxAMResourcePerQueuePercent = 0.1 [= configuredMaximumAMResourcePercent ]
minimumAllocationFactor = 0.875 [= (float)(maximumAllocationMemory -
minimumAllocationMemory) / maximumAllocationMemory ]
numContainers = 0 [= currentNumContainers ]
state = RUNNING [= configuredState ]
acls = SUBMIT_APPLICATIONS:*ADMINISTER_QUEUE:* [= configuredAcls ]
nodeLocalityDelay = 40
19/05/05 11:52:32 INFO capacity.CapacityScheduler: Initialized queue: default:
capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>,
usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0
19/05/05 11:52:32 INFO capacity.CapacityScheduler: Initialized queue: root:
numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0,
vCores:0>usedCapacity=0.0, numApps=0, numContainers=0
19/05/05 11:52:32 INFO capacity.CapacityScheduler: Initialized root queue root:
numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0,
vCores:0>usedCapacity=0.0, numApps=0, numContainers=0
19/05/05 11:52:32 INFO capacity.CapacityScheduler: Initialized
CapacityScheduler with calculator=class
org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator,
minimumAllocation=<<memory:1024, vCores:1>>, maximumAllocation=<<memory:8192,
vCores:32>>, asynchronousScheduling=false, asyncScheduleInterval=5ms
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.resourcemanager.RMAppManagerEventType for class
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncherEventType
for class
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher
19/05/05 11:52:33 INFO resourcemanager.RMNMInfo: Registered RMNMInfo MBean
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for
class
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for
class
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for
class
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for
class
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for
class
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for
class
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for
class
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for
class
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for
class
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
19/05/05 11:52:33 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.resourcemanager.ahs.WritingHistoryEventType for
class
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler
19/05/05 11:52:33 INFO util.HostsFileReader: Refreshing hosts (include/exclude)
list
19/05/05 11:52:43 INFO resourcemanager.ResourceManager: Transitioning to active
state
19/05/05 11:52:43 INFO security.RMContainerTokenSecretManager: Rolling
master-key for container-tokens
19/05/05 11:52:43 INFO security.AMRMTokenSecretManager: Rolling master-key for
amrm-tokens
19/05/05 11:52:43 INFO security.NMTokenSecretManagerInRM: Rolling master-key
for nm-tokens
19/05/05 11:52:43 INFO delegation.AbstractDelegationTokenSecretManager:
Updating the current master key for generating delegation tokens
19/05/05 11:52:43 INFO security.RMDelegationTokenSecretManager: storing master
key with keyID 1
19/05/05 11:52:43 INFO delegation.AbstractDelegationTokenSecretManager:
Starting expired delegation token remover thread, tokenRemoverScanInterval=60
min(s)
19/05/05 11:52:43 INFO delegation.AbstractDelegationTokenSecretManager:
Updating the current master key for generating delegation tokens
19/05/05 11:52:43 INFO security.RMDelegationTokenSecretManager: storing master
key with keyID 2
19/05/05 11:52:53 INFO ipc.CallQueueManager: Using callQueue class
java.util.concurrent.LinkedBlockingQueue
19/05/05 11:52:53 INFO ipc.Server: Starting Socket Reader #1 for port 8031
19/05/05 11:52:53 INFO pb.RpcServerFactoryPBImpl: Adding protocol
org.apache.hadoop.yarn.server.api.ResourceTrackerPB to the server
19/05/05 11:52:53 INFO ipc.Server: IPC Server Responder: starting
19/05/05 11:52:53 INFO ipc.Server: IPC Server listener on 8031: starting
19/05/05 11:53:03 INFO ipc.CallQueueManager: Using callQueue class
java.util.concurrent.LinkedBlockingQueue
19/05/05 11:53:03 INFO ipc.Server: Starting Socket Reader #1 for port 8030
19/05/05 11:53:03 INFO pb.RpcServerFactoryPBImpl: Adding protocol
org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB to the server
19/05/05 11:53:03 INFO ipc.Server: IPC Server listener on 8030: starting
19/05/05 11:53:03 INFO ipc.Server: IPC Server Responder: starting
19/05/05 11:53:13 INFO ipc.CallQueueManager: Using callQueue class
java.util.concurrent.LinkedBlockingQueue
19/05/05 11:53:13 INFO ipc.Server: Starting Socket Reader #1 for port 8032
19/05/05 11:53:13 INFO pb.RpcServerFactoryPBImpl: Adding protocol
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB to the server
19/05/05 11:53:13 INFO ipc.Server: IPC Server Responder: starting
19/05/05 11:53:13 INFO ipc.Server: IPC Server listener on 8032: starting
19/05/05 11:53:14 INFO resourcemanager.ResourceManager: Transitioned to active
state
19/05/05 11:53:14 INFO mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
19/05/05 11:53:14 INFO http.HttpRequestLog: Http request log for
http.requests.resourcemanager is not defined
19/05/05 11:53:14 INFO http.HttpServer2: Added global filter 'safety'
(class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
19/05/05 11:53:14 INFO http.HttpServer2: Added filter static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to
context cluster
19/05/05 11:53:14 INFO http.HttpServer2: Added filter static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to
context logs
19/05/05 11:53:14 INFO http.HttpServer2: Added filter static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to
context static
19/05/05 11:53:14 INFO http.HttpServer2: adding path spec: /cluster/*
19/05/05 11:53:14 INFO http.HttpServer2: adding path spec: /ws/*
19/05/05 11:53:14 INFO http.HttpServer2: Jetty bound to port 8088
19/05/05 11:53:14 INFO mortbay.log: jetty-6.1.26
19/05/05 11:53:14 INFO mortbay.log: Extract
jar:file:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.4.1.jar!/webapps/cluster
to /tmp/Jetty_0_0_0_0_8088_cluster____u0rgz3/webapp
19/05/05 11:53:14 INFO mortbay.log: Started [email protected]:8088
19/05/05 11:53:14 INFO webapp.WebApps: Web app /cluster started at 8088
19/05/05 11:53:14 INFO webapp.WebApps: Registered webapp guice modules
19/05/05 11:53:25 INFO ipc.CallQueueManager: Using callQueue class
java.util.concurrent.LinkedBlockingQueue
19/05/05 11:53:25 INFO ipc.Server: Starting Socket Reader #1 for port 8033
19/05/05 11:53:25 INFO pb.RpcServerFactoryPBImpl: Adding protocol
org.apache.hadoop.yarn.server.api.ResourceManagerAdministrationProtocolPB to
the server
19/05/05 11:53:25 INFO ipc.Server: IPC Server Responder: starting
19/05/05 11:53:25 INFO ipc.Server: IPC Server listener on 8033: starting
19/05/05 11:53:45 INFO util.RackResolver: Resolved a2115.smile.com to
/default-rack
19/05/05 11:53:45 INFO resourcemanager.ResourceTrackerService: NodeManager from
node a2115.smile.com(cmPort: 0 httpPort: 80) registered with capability:
<memory:10240, vCores:10>, assigned nodeId a2115.smile.com:0
19/05/05 11:53:45 INFO rmnode.RMNodeImpl: a2115.smile.com:0 Node Transitioned
from NEW to RUNNING
19/05/05 11:53:45 INFO capacity.CapacityScheduler: Added node a2115.smile.com:0
clusterResource: <memory:10240, vCores:10>
19/05/05 11:54:05 INFO util.RackResolver: Resolved a2118.smile.com to
/default-rack
19/05/05 11:54:05 INFO resourcemanager.ResourceTrackerService: NodeManager from
node a2118.smile.com(cmPort: 1 httpPort: 80) registered with capability:
<memory:10240, vCores:10>, assigned nodeId a2118.smile.com:1
19/05/05 11:54:05 INFO rmnode.RMNodeImpl: a2118.smile.com:1 Node Transitioned
from NEW to RUNNING
19/05/05 11:54:05 INFO capacity.CapacityScheduler: Added node a2118.smile.com:1
clusterResource: <memory:20480, vCores:20>
19/05/05 11:54:25 INFO util.RackResolver: Resolved a2117.smile.com to
/default-rack
19/05/05 11:54:25 INFO resourcemanager.ResourceTrackerService: NodeManager from
node a2117.smile.com(cmPort: 2 httpPort: 80) registered with capability:
<memory:10240, vCores:10>, assigned nodeId a2117.smile.com:2
19/05/05 11:54:25 INFO rmnode.RMNodeImpl: a2117.smile.com:2 Node Transitioned
from NEW to RUNNING
19/05/05 11:54:25 INFO capacity.CapacityScheduler: Added node a2117.smile.com:2
clusterResource: <memory:30720, vCores:30>
19/05/05 11:54:45 INFO util.RackResolver: Resolved a2116.smile.com to
/default-rack
19/05/05 11:54:45 INFO resourcemanager.ResourceTrackerService: NodeManager from
node a2116.smile.com(cmPort: 3 httpPort: 80) registered with capability:
<memory:10240, vCores:10>, assigned nodeId a2116.smile.com:3
19/05/05 11:54:45 INFO rmnode.RMNodeImpl: a2116.smile.com:3 Node Transitioned
from NEW to RUNNING
{color:#FF0000}19/05/05 11:54:45 INFO capacity.CapacityScheduler: Added node
a2116.smile.com:3 clusterResource: <memory:40960, vCores:40>{color}
{color:#FF0000}Exception in thread "main" java.lang.RuntimeException:
java.lang.NullPointerException{color}
{color:#FF0000} at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131){color}
{color:#FF0000} at
org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:394){color}
{color:#FF0000} at
org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:246){color}
{color:#FF0000} at
org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:141){color}
{color:#FF0000} at
org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:524){color}
{color:#FF0000}Caused by: java.lang.NullPointerException{color}
{color:#FF0000} at
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936){color}
{color:#FF0000} at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:123){color}
{color:#FF0000} ... 4 more{color}
*After waiting some minutes I got the following messages and then nothing :(*
19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor:
Expired:a2115.smile.com:0 Timed out after 600 secs
19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor:
Expired:a2118.smile.com:1 Timed out after 600 secs
19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor:
Expired:a2117.smile.com:2 Timed out after 600 secs
19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor:
Expired:a2116.smile.com:3 Timed out after 600 secs
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2115.smile.com:0
as it is now LOST
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2115.smile.com:0 Node Transitioned
from RUNNING to LOST
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2118.smile.com:1
as it is now LOST
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2118.smile.com:1 Node Transitioned
from RUNNING to LOST
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2117.smile.com:2
as it is now LOST
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2117.smile.com:2 Node Transitioned
from RUNNING to LOST
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2116.smile.com:3
as it is now LOST
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2116.smile.com:3 Node Transitioned
from RUNNING to LOST
19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node
a2115.smile.com:0 clusterResource: <memory:30720, vCores:30>
19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node
a2118.smile.com:1 clusterResource: <memory:20480, vCores:20>
19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node
a2117.smile.com:2 clusterResource: <memory:10240, vCores:10>
19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node
a2116.smile.com:3 clusterResource: <memory:0, vCores:0>
*1) I am looking forward to hear from you as I stuck here!*
*2) My second question is that, how I can extend SLS? I mean, where shall I
write my scheduler code in SLS, run it, and get results? (I need to simulate my
scheduler and then compare it with other schedulers like FIFO, Fair, and
Capacity)*
*Thanks a lot,*
*Neda*
was (Author: nedamaleki):
*Dear Wei Yan,*
*I use hadoop 2.4.1. When I want to run SLS, I face with the same problem as
YukunTsang:*
19/05/05 11:54:45 INFO capacity.CapacityScheduler: Added node a2116.smile.com:3
clusterResource: <memory:40960, vCores:40>
Exception in thread "main" java.lang.RuntimeException:
java.lang.NullPointerException
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
at
org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:394)
at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:246)
at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:141)
at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:524)
Caused by: java.lang.NullPointerException
at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:123)
... 4 more
*After waiting some minutes I got the following messages and then nothing :(*
19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor:
Expired:a2115.smile.com:0 Timed out after 600 secs
19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor:
Expired:a2118.smile.com:1 Timed out after 600 secs
19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor:
Expired:a2117.smile.com:2 Timed out after 600 secs
19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor:
Expired:a2116.smile.com:3 Timed out after 600 secs
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2115.smile.com:0
as it is now LOST
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2115.smile.com:0 Node Transitioned
from RUNNING to LOST
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2118.smile.com:1
as it is now LOST
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2118.smile.com:1 Node Transitioned
from RUNNING to LOST
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2117.smile.com:2
as it is now LOST
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2117.smile.com:2 Node Transitioned
from RUNNING to LOST
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2116.smile.com:3
as it is now LOST
19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2116.smile.com:3 Node Transitioned
from RUNNING to LOST
19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node
a2115.smile.com:0 clusterResource: <memory:30720, vCores:30>
19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node
a2118.smile.com:1 clusterResource: <memory:20480, vCores:20>
19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node
a2117.smile.com:2 clusterResource: <memory:10240, vCores:10>
19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node
a2116.smile.com:3 clusterResource: <memory:0, vCores:0>
*I noticed when it reaches to <memory:40960, vCores:40>, it shoots the
exception and I do not know why.*
*1) I am looking forward to hear from you as I stuck here!*
*2) My second question is that, how I can extend SLS? I mean, where shall I
write my scheduler code in SLS, run it, and get results? (I need to simulate my
scheduler and then compare it with other schedulers like FIFO, Fair, and
Capacity)*
*Thanks a lot,*
*Neda*
> Yarn Scheduler Load Simulator
> -----------------------------
>
> Key: YARN-1021
> URL: https://issues.apache.org/jira/browse/YARN-1021
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: scheduler
> Reporter: Wei Yan
> Assignee: Wei Yan
> Priority: Major
> Fix For: 2.3.0
>
> Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz,
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch,
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch,
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch,
> YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf
>
>
> The Yarn Scheduler is a fertile area of interest with different
> implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile,
> several optimizations are also made to improve scheduler performance for
> different scenarios and workload. Each scheduler algorithm has its own set of
> features, and drives scheduling decisions by many factors, such as fairness,
> capacity guarantee, resource availability, etc. It is very important to
> evaluate a scheduler algorithm very well before we deploy it in a production
> cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling
> algorithm. Evaluating in a real cluster is always time and cost consuming,
> and it is also very hard to find a large-enough cluster. Hence, a simulator
> which can predict how well a scheduler algorithm for some specific workload
> would be quite useful.
> We want to build a Scheduler Load Simulator to simulate large-scale Yarn
> clusters and application loads in a single machine. This would be invaluable
> in furthering Yarn by providing a tool for researchers and developers to
> prototype new scheduler features and predict their behavior and performance
> with reasonable amount of confidence, there-by aiding rapid innovation.
> The simulator will exercise the real Yarn ResourceManager removing the
> network factor by simulating NodeManagers and ApplicationMasters via handling
> and dispatching NM/AMs heartbeat events from within the same JVM.
> To keep tracking of scheduler behavior and performance, a scheduler wrapper
> will wrap the real scheduler.
> The simulator will produce real time metrics while executing, including:
> * Resource usages for whole cluster and each queue, which can be utilized to
> configure cluster and queue's capacity.
> * The detailed application execution trace (recorded in relation to simulated
> time), which can be analyzed to understand/validate the scheduler behavior
> (individual jobs turn around time, throughput, fairness, capacity guarantee,
> etc).
> * Several key metrics of scheduler algorithm, such as time cost of each
> scheduler operation (allocate, handle, etc), which can be utilized by Hadoop
> developers to find the code spots and scalability limits.
> The simulator will provide real time charts showing the behavior of the
> scheduler and its performance.
> A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing
> how to use simulator to simulate Fair Scheduler and Capacity Scheduler.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]