Evan Tepsic created YARN-8014: --------------------------------- Summary: YARN ResourceManager Lists A NodeManager As RUNNING & SHUTDOWN Simultaneously Key: YARN-8014 URL: https://issues.apache.org/jira/browse/YARN-8014 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.8.2 Reporter: Evan Tepsic
A graceful shutdown & then startup of a NodeManager process using YARN/HDFS v2.8.2 seems to successfully place the Node back into RUNNING state. However, ResouceManager appears to keep the Node also in SHUTDOWN state. *Steps To Reproduce:* 1. SSH to host running NodeManager. 2. Switch-to UserID that NodeManager is running as (hadoop). 3. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager 4. Wait for NodeManager process to terminate gracefully. 5. Confirm Node is in SHUTDOWN state via: http://rb01rm01.local:8088/cluster/nodes 6. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager 7. Confirm Node is in RUNNING state via: http://rb01rm01.local:8088/cluster/nodes *Investigation:* 1. Review contents of ResourceManager + NodeManager log-files: +ResourceManager log-file:+ 2018-03-08 08:15:44,085 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Node with node id : rb0101.local:43892 has shutdown, hence unregistering the node. 2018-03-08 08:15:44,092 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node rb0101.local:43892 as it is now SHUTDOWN 2018-03-08 08:15:44,092 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: rb0101.local:43892 Node Transitioned from RUNNING to SHUTDOWN 2018-03-08 08:15:44,093 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Removed node rb0101.local:43892 cluster capacity: <memory:110592, vCores:54> 2018-03-08 08:16:08,915 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: NodeManager from node rb0101.local(cmPort: 42627 httpPort: 8042) registered with capability: <memory:12288, vCores:6>, assigned nodeId rb0101.local:42627 2018-03-08 08:16:08,916 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: rb0101.local:42627 Node Transitioned from NEW to RUNNING 2018-03-08 08:16:08,916 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Added node rb0101.local:42627 cluster capacity: <memory:122880, vCores:60> 2018-03-08 08:16:34,826 WARN org.apache.hadoop.ipc.Server: Large response size 2976014 for call Call#428958 Retry#0 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 192.168.1.100:44034 +NodeManager log-file:+ 2018-03-08 08:00:14,500 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Cache Size Before Clean: 10720046250, Total Deleted: 0, Public Deleted: 0, Private Deleted: 0 2018-03-08 08:10:14,498 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Cache Size Before Clean: 10720046250, Total Deleted: 0, Public Deleted: 0, Private Deleted: 0 2018-03-08 08:15:44,048 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM 2018-03-08 08:15:44,101 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Successfully Unregistered the Node rb0101.local:43892 with ResourceManager. 2018-03-08 08:15:44,114 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8042 2018-03-08 08:15:44,226 INFO org.apache.hadoop.ipc.Server: Stopping server on 43892 2018-03-08 08:15:44,232 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 43892 2018-03-08 08:15:44,237 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder 2018-03-08 08:15:44,239 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: org.apache.hadoop.yarn.server.nodemanager.containermanager.logag gregation.LogAggregationService waiting for pending aggregation during exit 2018-03-08 08:15:44,242 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.Cont ainersMonitorImpl is interrupted. Exiting. 2018-03-08 08:15:44,284 INFO org.apache.hadoop.ipc.Server: Stopping server on 8040 2018-03-08 08:15:44,285 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8040 2018-03-08 08:15:44,285 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder 2018-03-08 08:15:44,287 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Public cache exiting 2018-03-08 08:15:44,289 WARN org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl: org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl is interrupted. Exiting. 2018-03-08 08:15:44,294 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NodeManager metrics system... 2018-03-08 08:15:44,295 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system stopped. 2018-03-08 08:15:44,296 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system shutdown complete. 2018-03-08 08:15:44,297 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NodeManager at rb0101.local/192.168.1.101 ************************************************************/ 2018-03-08 08:16:01,905 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NodeManager STARTUP_MSG: user = hadoop STARTUP_MSG: host = rb0101.local/192.168.1.101 STARTUP_MSG: args = [] STARTUP_MSG: version = 2.8.2 STARTUP_MSG: classpath = blahblahblah (truncated for size-purposes) STARTUP_MSG: build = Unknown -r Unknown; compiled by 'root' on 2017-09-14T18:22Z STARTUP_MSG: java = 1.8.0_144 ************************************************************/ 2018-03-08 08:16:01,918 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: registered UNIX signal handlers for [TERM, HUP, INT] 2018-03-08 08:16:03,202 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: Node Manager health check script is not available or doesn't have execute permission, so not starting the node health script runner. 2018-03-08 08:16:03,321 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher 2018-03-08 08:16:03,322 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationEventType for c lass org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher 2018-03-08 08:16:03,323 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.LocalizationEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService 2018-03-08 08:16:03,323 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServicesEventType for class org.apa che.hadoop.yarn.server.nodemanager.containermanager.AuxServices 2018-03-08 08:16:03,324 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl 2018-03-08 08:16:03,324 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEventType f or class org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher 2018-03-08 08:16:03,347 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.ContainerManagerEventType for class org.apache.hadoop.y arn.server.nodemanager.containermanager.ContainerManagerImpl 2018-03-08 08:16:03,348 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.NodeManagerEventType for class org.apache.hadoop.yarn.s erver.nodemanager.NodeManager 2018-03-08 08:16:03,402 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2018-03-08 08:16:03,484 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s). 2018-03-08 08:16:03,484 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system started 2018-03-08 08:16:03,561 INFO org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl: Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.ResourceCalculatorPlugin@4b8729f f 2018-03-08 08:16:03,564 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.event.LogHandlerEventType f or class org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService 2018-03-08 08:16:03,565 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploa dEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploadService 2018-03-08 08:16:03,565 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: AMRMProxyService is disabled 2018-03-08 08:16:03,566 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: per directory file limit = 8192 2018-03-08 08:16:03,621 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: usercache path : file:/space/hadoop/tmp/nm-local-dir/usercache_ DEL_1520518563569 2018-03-08 08:16:03,667 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting path : file:/space/hadoop/tmp/nm-local-dir/usercache_DEL_1520518563569/user1 2018-03-08 08:16:03,667 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting path : file:/space/hadoop/tmp/nm-local-dir/usercache_DEL_1520518563569/user2 2018-03-08 08:16:03,668 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting path : file:/space/hadoop/tmp/nm-local-dir/usercache_DEL_1520518563569/user3 2018-03-08 08:16:03,681 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting path : file:/space/hadoop/tmp/nm-local-dir/usercache_DEL_1520518563569/user4 2018-03-08 08:16:03,739 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.LocalizerEventType for class org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker 2018-03-08 08:16:03,793 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Adding auxiliary service mapreduce_shuffle, "mapreduce_shuffle" 2018-03-08 08:16:03,826 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.ResourceCalculatorPlugin@1187c9e8 2018-03-08 08:16:03,826 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Using ResourceCalculatorProcessTree : null 2018-03-08 08:16:03,827 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Physical memory check enabled: true 2018-03-08 08:16:03,827 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Virtual memory check enabled: true 2018-03-08 08:16:03,832 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: ContainersMonitor enabled: true 2018-03-08 08:16:03,841 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Nodemanager resources: memory set to 12288MB. 2018-03-08 08:16:03,841 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Nodemanager resources: vcores set to 6. 2018-03-08 08:16:03,846 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Initialized nodemanager with : physical-memory=12288 virtual-memory=25805 virtual-cores=6 2018-03-08 08:16:03,850 INFO org.apache.hadoop.util.JvmPauseMonitor: Starting JVM pause monitor 2018-03-08 08:16:03,908 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 2000 scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler 2018-03-08 08:16:03,932 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 42627 2018-03-08 08:16:04,153 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.api.ContainerManagementProtocolPB to the server 2018-03-08 08:16:04,153 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Blocking new container-requests as container manager rpc server is still starting. 2018-03-08 08:16:04,154 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2018-03-08 08:16:04,154 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 42627: starting 2018-03-08 08:16:04,166 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Updating node address : rb0101.local:42627 2018-03-08 08:16:04,183 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 500 scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler 2018-03-08 08:16:04,184 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8040 2018-03-08 08:16:04,191 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB to the server 2018-03-08 08:16:04,191 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2018-03-08 08:16:04,191 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8040: starting 2018-03-08 08:16:04,192 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localizer started on port 8040 2018-03-08 08:16:04,312 INFO org.apache.hadoop.mapred.IndexCache: IndexCache created with max memory = 10485760 2018-03-08 08:16:04,330 INFO org.apache.hadoop.mapred.ShuffleHandler: mapreduce_shuffle listening on port 13562 2018-03-08 08:16:04,337 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: ContainerManager started at rb0101.local/192.168.1.101:42627 2018-03-08 08:16:04,337 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: ContainerManager bound to 0.0.0.0/0.0.0.0:0 2018-03-08 08:16:04,340 INFO org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer: Instantiating NMWebApp at 0.0.0.0:8042 2018-03-08 08:16:04,427 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2018-03-08 08:16:04,436 INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets. 2018-03-08 08:16:04,442 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.nodemanager is not defined 2018-03-08 08:16:04,450 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter) 2018-03-08 08:16:04,461 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context node 2018-03-08 08:16:04,462 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs 2018-03-08 08:16:04,462 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static 2018-03-08 08:16:04,462 INFO org.apache.hadoop.security.HttpCrossOriginFilterInitializer: CORS filter not enabled. Please set hadoop.http.cross-origin.enabled to 'true' to enable it 2018-03-08 08:16:04,465 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /node/* 2018-03-08 08:16:04,465 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /ws/* 2018-03-08 08:16:04,843 INFO org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules 2018-03-08 08:16:04,846 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 8042 2018-03-08 08:16:04,846 INFO org.mortbay.log: jetty-6.1.26 2018-03-08 08:16:04,877 INFO org.mortbay.log: Extract jar:file:/opt/hadoop-2.8.2/share/hadoop/yarn/hadoop-yarn-common-2.8.2.jar!/webapps/node to /tmp/Jetty_0_0_0_0_8042_node____19tj0x/webapp 2018-03-08 08:16:08,355 INFO org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8042 2018-03-08 08:16:08,356 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app node started at 8042 2018-03-08 08:16:08,473 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node ID assigned is : rb0101.local:42627 2018-03-08 08:16:08,498 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at rb01rm01.local/192.168.1.100:8031 2018-03-08 08:16:08,613 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 0 NM container statuses: [] 2018-03-08 08:16:08,621 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registering with RM using containers :[] 2018-03-08 08:16:08,934 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Rolling master-key for container-tokens, got key with id -2086472604 2018-03-08 08:16:08,938 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: Rolling master-key for container-tokens, got key with id -426187560 2018-03-08 08:16:08,939 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as rb0101.local:42627 with total resource of <memory:12288, vCores:6> 2018-03-08 08:16:08,939 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying ContainerManager to unblock new container-requests 2018-03-08 08:26:04,174 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Cache Size Before Clean: 0, Total Deleted: 0, Public Deleted: 0, Private Deleted: 0 2018-03-08 08:36:04,170 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Cache Size Before Clean: 0, Total Deleted: 0, Public Deleted: 0, Private Deleted: 0 2018-03-08 08:46:04,170 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Cache Size Before Clean: 0, Total Deleted: 0, Public Deleted: 0, Private Deleted: 0 2. Listing all of YARN's Nodes, we can see it was returned to the RUNNING state. However, when listing all nodes, it shows the node in 2 states; RUNNING and SHUTDOWN: [hadoop@rb01rm01 logs]$ /opt/hadoop/bin/yarn node -list -all 18/03/08 09:20:33 INFO client.RMProxy: Connecting to ResourceManager at rb01rm01.local/192.168.1.100:8032 18/03/08 09:20:34 INFO client.AHSProxy: Connecting to Application History server at rb01rm01.local/192.168.1.100:10200 Total Nodes:11 Node-Id Node-State Node-Http-Address Number-of-Running-Containers rb0106.local:44160 RUNNING rb0106.local:8042 0 rb0105.local:32832 RUNNING rb0105.local:8042 0 rb0101.local:42627 RUNNING rb0101.local:8042 0 rb0108.local:38209 RUNNING rb0108.local:8042 0 rb0107.local:34306 RUNNING rb0107.local:8042 0 rb0102.local:43063 RUNNING rb0102.local:8042 0 rb0103.local:42374 RUNNING rb0103.local:8042 0 rb0109.local:37455 RUNNING rb0109.local:8042 0 rb0110.local:36690 RUNNING rb0110.local:8042 0 rb0104.local:33268 RUNNING rb0104.local:8042 0 rb0101.local:43892 SHUTDOWN rb0101.local:8042 0 [hadoop@rb01rm01 logs]$ [hadoop@rb01rm01 logs]$ /opt/hadoop/bin/yarn node -list -states RUNNING 18/03/08 09:20:55 INFO client.RMProxy: Connecting to ResourceManager at rb01rm01.local/192.168.1.100:8032 18/03/08 09:20:56 INFO client.AHSProxy: Connecting to Application History server at rb01rm01.local/192.168.1.100:10200 Total Nodes:10 Node-Id Node-State Node-Http-Address Number-of-Running-Containers rb0106.local:44160 RUNNING rb0106.local:8042 0 rb0105.local:32832 RUNNING rb0105.local:8042 0 rb0101.local:42627 RUNNING rb0101.local:8042 0 rb0108.local:38209 RUNNING rb0108.local:8042 0 rb0107.local:34306 RUNNING rb0107.local:8042 0 rb0102.local:43063 RUNNING rb0102.local:8042 0 rb0103.local:42374 RUNNING rb0103.local:8042 0 rb0109.local:37455 RUNNING rb0109.local:8042 0 rb0110.local:36690 RUNNING rb0110.local:8042 0 rb0104.local:33268 RUNNING rb0104.local:8042 0 [hadoop@rb01rm01 logs]$ /opt/hadoop/bin/yarn node -list -states SHUTDOWN 18/03/08 09:21:01 INFO client.RMProxy: Connecting to ResourceManager at rb01rm01.local/192.168.1.100:8032 18/03/08 09:21:01 INFO client.AHSProxy: Connecting to Application History server at rb01rm01.local/192.168.1.100:10200 Total Nodes:0 Node-Id Node-State Node-Http-Address Number-of-Running-Containers [hadoop@rb01rm01 logs]$ 3. ResourceManager however, does not list Node rb0101.local as SHUTDOWN when specifically requesting list of Nodes in SHUTDOWN state: [hadoop@rb01rm01 bin]$ /opt/hadoop/bin/yarn node -list -states SHUTDOWN 18/03/08 08:28:23 INFO client.RMProxy: Connecting to ResourceManager at rb01rm01.local/v.x.y.z:8032 18/03/08 08:28:24 INFO client.AHSProxy: Connecting to Application History server at rb01rm01.local/v.x.y.z:10200 Total Nodes:0 Node-Id Node-State Node-Http-Address Number-of-Running-Containers [hadoop@rb01rm01 bin]$ -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org