[jira] [Resolved] (YARN-8014) YARN ResourceManager Lists A NodeManager As RUNNING & SHUTDOWN Simultaneously

2018-03-08 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt resolved YARN-8014.

Resolution: Invalid

Closing as invalid

> YARN ResourceManager Lists A NodeManager As RUNNING & SHUTDOWN Simultaneously
> -
>
> Key: YARN-8014
> URL: https://issues.apache.org/jira/browse/YARN-8014
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.2
>Reporter: Evan Tepsic
>Priority: Minor
>
> A graceful shutdown & then startup of a NodeManager process using YARN/HDFS 
> v2.8.2 seems to successfully place the Node back into RUNNING state. However, 
> ResouceManager appears to keep the Node also in SHUTDOWN state.
>  
> *Steps To Reproduce:*
> 1. SSH to host running NodeManager.
>  2. Switch-to UserID that NodeManager is running as (hadoop).
>  3. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager
>  4. Wait for NodeManager process to terminate gracefully.
>  5. Confirm Node is in SHUTDOWN state via: 
> [http://rb01rm01.local:8088/cluster/nodes]
>  6. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager
>  7. Confirm Node is in RUNNING state via: 
> [http://rb01rm01.local:8088/cluster/nodes]
>  
> *Investigation:*
>  1. Review contents of ResourceManager + NodeManager log-files:
> +ResourceManager log-[file:+|file:///+]
>  2018-03-08 08:15:44,085 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Node 
> with node id : rb0101.local:43892 has shutdown, hence unregistering the node.
>  2018-03-08 08:15:44,092 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
> Node rb0101.local:43892 as it is now SHUTDOWN
>  2018-03-08 08:15:44,092 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> rb0101.local:43892 Node Transitioned from RUNNING to SHUTDOWN
>  2018-03-08 08:15:44,093 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Removed node rb0101.local:43892 cluster capacity: 
>  2018-03-08 08:16:08,915 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> NodeManager from node rb0101.local(cmPort: 42627 httpPort: 8042) registered 
> with capability: , assigned nodeId rb0101.local:42627
>  2018-03-08 08:16:08,916 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> rb0101.local:42627 Node Transitioned from NEW to RUNNING
>  2018-03-08 08:16:08,916 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Added node rb0101.local:42627 cluster capacity: 
>  2018-03-08 08:16:34,826 WARN org.apache.hadoop.ipc.Server: Large response 
> size 2976014 for call Call#428958 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
> 192.168.1.100:44034
>  
> +NodeManager log-[file:+|file:///+]
>  2018-03-08 08:00:14,500 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Cache Size Before Clean: 10720046250, Total Deleted: 0, Public
>  Deleted: 0, Private Deleted: 0
>  2018-03-08 08:10:14,498 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Cache Size Before Clean: 10720046250, Total Deleted: 0, Public
>  Deleted: 0, Private Deleted: 0
>  2018-03-08 08:15:44,048 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: 
> SIGTERM
>  2018-03-08 08:15:44,101 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Successfully 
> Unregistered the Node rb0101.local:43892 with ResourceManager.
>  2018-03-08 08:15:44,114 INFO org.mortbay.log: Stopped 
> HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8042
>  2018-03-08 08:15:44,226 INFO org.apache.hadoop.ipc.Server: Stopping server 
> on 43892
>  2018-03-08 08:15:44,232 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 43892
>  2018-03-08 08:15:44,237 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server Responder
>  2018-03-08 08:15:44,239 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
>  org.apache.hadoop.yarn.server.nodemanager.containermanager.logag
>  gregation.LogAggregationService waiting for pending aggregation during exit
>  2018-03-08 08:15:44,242 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.Cont
>  ainersMonitorImpl is interrupted. Exiting.
>  2018-03-08 08:15:44,284 INFO org.apache.hadoop.ipc.Server: Stopping server 
> on 8040
>  2018-03-08 08:15:44,285 INFO 

[jira] [Resolved] (YARN-8014) YARN ResourceManager Lists A NodeManager As RUNNING & SHUTDOWN Simultaneously

2018-03-08 Thread Evan Tepsic (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Evan Tepsic resolved YARN-8014.
---
Resolution: Fixed

> YARN ResourceManager Lists A NodeManager As RUNNING & SHUTDOWN Simultaneously
> -
>
> Key: YARN-8014
> URL: https://issues.apache.org/jira/browse/YARN-8014
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.2
>Reporter: Evan Tepsic
>Priority: Minor
>
> A graceful shutdown & then startup of a NodeManager process using YARN/HDFS 
> v2.8.2 seems to successfully place the Node back into RUNNING state. However, 
> ResouceManager appears to keep the Node also in SHUTDOWN state.
>  
> *Steps To Reproduce:*
> 1. SSH to host running NodeManager.
>  2. Switch-to UserID that NodeManager is running as (hadoop).
>  3. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager
>  4. Wait for NodeManager process to terminate gracefully.
>  5. Confirm Node is in SHUTDOWN state via: 
> [http://rb01rm01.local:8088/cluster/nodes]
>  6. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager
>  7. Confirm Node is in RUNNING state via: 
> [http://rb01rm01.local:8088/cluster/nodes]
>  
> *Investigation:*
>  1. Review contents of ResourceManager + NodeManager log-files:
> +ResourceManager log-[file:+|file:///+]
>  2018-03-08 08:15:44,085 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Node 
> with node id : rb0101.local:43892 has shutdown, hence unregistering the node.
>  2018-03-08 08:15:44,092 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
> Node rb0101.local:43892 as it is now SHUTDOWN
>  2018-03-08 08:15:44,092 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> rb0101.local:43892 Node Transitioned from RUNNING to SHUTDOWN
>  2018-03-08 08:15:44,093 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Removed node rb0101.local:43892 cluster capacity: 
>  2018-03-08 08:16:08,915 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> NodeManager from node rb0101.local(cmPort: 42627 httpPort: 8042) registered 
> with capability: , assigned nodeId rb0101.local:42627
>  2018-03-08 08:16:08,916 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> rb0101.local:42627 Node Transitioned from NEW to RUNNING
>  2018-03-08 08:16:08,916 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Added node rb0101.local:42627 cluster capacity: 
>  2018-03-08 08:16:34,826 WARN org.apache.hadoop.ipc.Server: Large response 
> size 2976014 for call Call#428958 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
> 192.168.1.100:44034
>  
> +NodeManager log-[file:+|file:///+]
>  2018-03-08 08:00:14,500 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Cache Size Before Clean: 10720046250, Total Deleted: 0, Public
>  Deleted: 0, Private Deleted: 0
>  2018-03-08 08:10:14,498 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Cache Size Before Clean: 10720046250, Total Deleted: 0, Public
>  Deleted: 0, Private Deleted: 0
>  2018-03-08 08:15:44,048 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: 
> SIGTERM
>  2018-03-08 08:15:44,101 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Successfully 
> Unregistered the Node rb0101.local:43892 with ResourceManager.
>  2018-03-08 08:15:44,114 INFO org.mortbay.log: Stopped 
> HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8042
>  2018-03-08 08:15:44,226 INFO org.apache.hadoop.ipc.Server: Stopping server 
> on 43892
>  2018-03-08 08:15:44,232 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 43892
>  2018-03-08 08:15:44,237 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server Responder
>  2018-03-08 08:15:44,239 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
>  org.apache.hadoop.yarn.server.nodemanager.containermanager.logag
>  gregation.LogAggregationService waiting for pending aggregation during exit
>  2018-03-08 08:15:44,242 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.Cont
>  ainersMonitorImpl is interrupted. Exiting.
>  2018-03-08 08:15:44,284 INFO org.apache.hadoop.ipc.Server: Stopping server 
> on 8040
>  2018-03-08 08:15:44,285 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
>