[ https://issues.apache.org/jira/browse/YARN-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735545#comment-13735545 ]
Steve Loughran commented on YARN-1054: -------------------------------------- logs with stack {code} 2013-08-09 16:46:31,208 [main] INFO yarn.cluster.YarnMiniClusterTestBase (YarnMiniClusterTestBase.groovy:describe(121)) - 2013-08-09 16:46:31,208 [main] INFO yarn.cluster.YarnMiniClusterTestBase (YarnMiniClusterTestBase.groovy:describe(122)) - =============================== 2013-08-09 16:46:31,208 [main] INFO yarn.cluster.YarnMiniClusterTestBase (YarnMiniClusterTestBase.groovy:describe(123)) - teardown 2013-08-09 16:46:31,208 [main] INFO yarn.cluster.YarnMiniClusterTestBase (YarnMiniClusterTestBase.groovy:describe(124)) - =============================== 2013-08-09 16:46:31,208 [main] INFO yarn.cluster.YarnMiniClusterTestBase (YarnMiniClusterTestBase.groovy:describe(125)) - 2013-08-09 16:46:31,216 [main] INFO org.mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@stevel-2.local:0 2013-08-09 16:46:31,319 [main] INFO hadoop.ipc.Server (Server.java:stop(2429)) - Stopping server on 57710 2013-08-09 16:46:31,324 [IPC Server listener on 57710] INFO hadoop.ipc.Server (Server.java:run(720)) - Stopping IPC Server listener on 57710 2013-08-09 16:46:31,326 [IPC Server Responder] INFO hadoop.ipc.Server (Server.java:run(866)) - Stopping IPC Server Responder 2013-08-09 16:46:31,334 [main] INFO hadoop.ipc.Server (Server.java:stop(2429)) - Stopping server on 57711 2013-08-09 16:46:31,336 [IPC Server listener on 57711] INFO hadoop.ipc.Server (Server.java:run(720)) - Stopping IPC Server listener on 57711 2013-08-09 16:46:31,336 [IPC Server Responder] INFO hadoop.ipc.Server (Server.java:run(866)) - Stopping IPC Server Responder 2013-08-09 16:46:31,339 [Public Localizer] INFO containermanager.localizer.ResourceLocalizationService (ResourceLocalizationService.java:run(728)) - Public cache exiting 2013-08-09 16:46:31,340 [main] INFO server.nodemanager.NodeManager (NodeManager.java:cleanupContainers(261)) - Containers still running on SHUTDOWN : [container_1376091973496_0001_01_000001, container_1376091973496_0001_01_000002] 2013-08-09 16:46:31,342 [main] INFO server.nodemanager.NodeManager (NodeManager.java:cleanupContainers(270)) - Waiting for containers to be killed 2013-08-09 16:46:31,343 [AsyncDispatcher event handler] INFO containermanager.container.Container (ContainerImpl.java:handle(860)) - Container container_1376091973496_0001_01_000001 transitioned from RUNNING to KILLING 2013-08-09 16:46:31,343 [AsyncDispatcher event handler] INFO containermanager.container.Container (ContainerImpl.java:handle(860)) - Container container_1376091973496_0001_01_000002 transitioned from RUNNING to KILLING 2013-08-09 16:46:31,344 [AsyncDispatcher event handler] INFO containermanager.launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(323)) - Cleaning up container container_1376091973496_0001_01_000001 2013-08-09 16:46:31,386 [ContainersLauncher #0] WARN server.nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(207)) - Exit code from container container_1376091973496_0001_01_000001 is : 143 2013-08-09 16:46:31,403 [AsyncDispatcher event handler] INFO containermanager.launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(323)) - Cleaning up container container_1376091973496_0001_01_000002 2013-08-09 16:46:31,441 [ContainersLauncher #1] WARN server.nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(207)) - Exit code from container container_1376091973496_0001_01_000002 is : 143 2013-08-09 16:46:31,456 [AsyncDispatcher event handler] INFO containermanager.container.Container (ContainerImpl.java:handle(860)) - Container container_1376091973496_0001_01_000001 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL 2013-08-09 16:46:31,457 [AsyncDispatcher event handler] WARN containermanager.container.Container (ContainerImpl.java:handle(856)) - Can't handle this event at current state: Current: [CONTAINER_CLEANEDUP_AFTER_KILL], eventType: [CONTAINER_KILLED_ON_REQUEST] org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:853) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:73) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:692) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:685) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:680) 2013-08-09 16:46:31,457 [AsyncDispatcher event handler] INFO containermanager.container.Container (ContainerImpl.java:handle(860)) - Container container_1376091973496_0001_01_000001 transitioned from CONTAINER_CLEANEDUP_AFTER_KILL to null 2013-08-09 16:46:31,457 [AsyncDispatcher event handler] INFO containermanager.container.Container (ContainerImpl.java:handle(860)) - Container container_1376091973496_0001_01_000002 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL 2013-08-09 16:46:31,458 [AsyncDispatcher event handler] WARN containermanager.container.Container (ContainerImpl.java:handle(856)) - Can't handle this event at current state: Current: [CONTAINER_CLEANEDUP_AFTER_KILL], eventType: [CONTAINER_KILLED_ON_REQUEST] org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:853) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:73) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:692) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:685) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:680) 2013-08-09 16:46:31,458 [AsyncDispatcher event handler] INFO containermanager.container.Container (ContainerImpl.java:handle(860)) - Container container_1376091973496_0001_01_000002 transitioned from CONTAINER_CLEANEDUP_AFTER_KILL to null 2013-08-09 16:46:31,460 [DeletionService #1] INFO server.nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:deleteAsUser(368)) - Deleting absolute path : /Users/stevel/Projects/Hortonworks/Projects/hoya/target/TestLiveRegionService/TestLiveRegionService-localDir-nm-1_0/usercache/stevel/appcache/application_1376091973496_0001/container_1376091973496_0001_01_000001 2013-08-09 16:46:31,461 [DeletionService #2] INFO server.nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:deleteAsUser(368)) - Deleting absolute path : /Users/stevel/Projects/Hortonworks/Projects/hoya/target/TestLiveRegionService/TestLiveRegionService-localDir-nm-1_0/usercache/stevel/appcache/application_1376091973496_0001/container_1376091973496_0001_01_000002 2013-08-09 16:46:31,461 [AsyncDispatcher event handler] INFO server.nodemanager.NMAuditLogger (NMAuditLogger.java:logSuccess(89)) - USER=stevel OPERATION=Container Finished - Killed TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1376091973496_0001 CONTAINERID=container_1376091973496_0001_01_000001 2013-08-09 16:46:31,465 [AsyncDispatcher event handler] INFO containermanager.container.Container (ContainerImpl.java:handle(860)) - Container container_1376091973496_0001_01_000001 transitioned from CONTAINER_CLEANEDUP_AFTER_KILL to DONE 2013-08-09 16:46:31,466 [AsyncDispatcher event handler] INFO server.nodemanager.NMAuditLogger (NMAuditLogger.java:logSuccess(89)) - USER=stevel OPERATION=Container Finished - Killed TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1376091973496_0001 CONTAINERID=container_1376091973496_0001_01_000002 2013-08-09 16:46:31,466 [AsyncDispatcher event handler] INFO containermanager.container.Container (ContainerImpl.java:handle(860)) - Container container_1376091973496_0001_01_000002 transitioned from CONTAINER_CLEANEDUP_AFTER_KILL to DONE 2013-08-09 16:46:31,467 [AsyncDispatcher event handler] INFO containermanager.application.Application (ApplicationImpl.java:transition(320)) - Removing container_1376091973496_0001_01_000001 from application application_1376091973496_0001 2013-08-09 16:46:31,467 [AsyncDispatcher event handler] INFO containermanager.application.Application (ApplicationImpl.java:transition(320)) - Removing container_1376091973496_0001_01_000002 from application application_1376091973496_0001 2013-08-09 16:46:32,106 [ResourceManager Event Processor] INFO resourcemanager.rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(220)) - container_1376091973496_0001_01_000001 Container Transitioned from RUNNING to COMPLETED 2013-08-09 16:46:32,112 [AsyncDispatcher event handler] INFO server.resourcemanager.ApplicationMasterService (ApplicationMasterService.java:unregisterAttempt(546)) - Unregistering app attempt : appattempt_1376091973496_0001_000001 2013-08-09 16:46:32,112 [ResourceManager Event Processor] INFO common.fica.FiCaSchedulerApp (FiCaSchedulerApp.java:containerCompleted(246)) - Completed container: container_1376091973496_0001_01_000001 in state: COMPLETED event:FINISHED 2013-08-09 16:46:32,113 [ResourceManager Event Processor] INFO server.resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(98)) - USER=stevel OPERATION=AM Released ContaineTARGET=SchedulerApp RESULT=SUCCESS APPID=application_1376091973496_0001 CONTAINERID=container_1376091973496_0001_01_000001 2013-08-09 16:46:32,113 [ResourceManager Event Processor] INFO common.fica.FiCaSchedulerNode (FiCaSchedulerNode.java:releaseContainer(149)) - Released container container_1376091973496_0001_01_000001 of capacity <memory:64, vCores:1> on host 10.11.3.237:57707, which currently has 1 containers, <memory:256, vCores:1> used and <memory:3840, vCores:7> available, release resources=true 2013-08-09 16:46:32,113 [ResourceManager Event Processor] INFO scheduler.fifo.FifoScheduler (FifoScheduler.java:containerCompleted(761)) - Application appattempt_1376091973496_0001_000001 released container container_1376091973496_0001_01_000001 on node: host: 10.11.3.237:57707 #containers=1 available=3840 used=256 with event: FINISHED 2013-08-09 16:46:32,114 [AsyncDispatcher event handler] INFO rmapp.attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(607)) - appattempt_1376091973496_0001_000001 State change from RUNNING to FAILED 2013-08-09 16:46:32,114 [ResourceManager Event Processor] INFO resourcemanager.rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(220)) - container_1376091973496_0001_01_000002 Container Transitioned from RUNNING to COMPLETED 2013-08-09 16:46:32,114 [AsyncDispatcher event handler] INFO resourcemanager.rmapp.RMAppImpl (RMAppImpl.java:transition(778)) - Application application_1376091973496_0001 failed 1 times due to AM Container for appattempt_1376091973496_0001_000001 exited with exitCode: 143 due to: Container Killed on Shutdown Container killed on request. Exit code is 143 .Failing this attempt.. Failing the application. 2013-08-09 16:46:32,114 [ResourceManager Event Processor] INFO common.fica.FiCaSchedulerApp (FiCaSchedulerApp.java:containerCompleted(246)) - Completed container: container_1376091973496_0001_01_000002 in state: COMPLETED event:FINISHED 2013-08-09 16:46:32,114 [ResourceManager Event Processor] INFO server.resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(98)) - USER=stevel OPERATION=AM Released ContaineTARGET=SchedulerApp RESULT=SUCCESS APPID=application_1376091973496_0001 CONTAINERID=container_1376091973496_0001_01_000002 2013-08-09 16:46:32,114 [ResourceManager Event Processor] INFO common.fica.FiCaSchedulerNode (FiCaSchedulerNode.java:releaseContainer(149)) - Released container container_1376091973496_0001_01_000002 of capacity <memory:256, vCores:1> on host 10.11.3.237:57707, which currently has 0 containers, <memory:0, vCores:0> used and <memory:4096, vCores:8> available, release resources=true 2013-08-09 16:46:32,115 [ResourceManager Event Processor] INFO scheduler.fifo.FifoScheduler (FifoScheduler.java:containerCompleted(761)) - Application appattempt_1376091973496_0001_000001 released container container_1376091973496_0001_01_000002 on node: host: 10.11.3.237:57707 #containers=0 available=4096 used=0 with event: FINISHED 2013-08-09 16:46:32,115 [AsyncDispatcher event handler] INFO resourcemanager.rmapp.RMAppImpl (RMAppImpl.java:handle(588)) - application_1376091973496_0001 State change from RUNNING to FAILED 2013-08-09 16:46:32,116 [pool-1-thread-2] INFO resourcemanager.amlauncher.AMLauncher (AMLauncher.java:run(262)) - Cleaning master appattempt_1376091973496_0001_000001 2013-08-09 16:46:32,116 [ResourceManager Event Processor] INFO resourcemanager.scheduler.AppSchedulingInfo (AppSchedulingInfo.java:clearRequests(108)) - Application application_1376091973496_0001 requests cleared 2013-08-09 16:46:32,116 [AsyncDispatcher event handler] WARN server.resourcemanager.RMAuditLogger (RMAuditLogger.java:logFailure(255)) - USER=stevel OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1376091973496_0001 failed 1 times due to AM Container for appattempt_1376091973496_0001_000001 exited with exitCode: 143 due to: Container Killed on Shutdown Container killed on request. Exit code is 143 .Failing this attempt.. Failing the application. APPID=application_1376091973496_0001 2013-08-09 16:46:32,117 [AsyncDispatcher event handler] INFO resourcemanager.recovery.RMStateStore (RMStateStore.java:handleStoreEvent(467)) - Removing info for app: application_1376091973496_0001 2013-08-09 16:46:32,118 [AsyncDispatcher event handler] INFO server.resourcemanager.RMAppManager$ApplicationSummary (RMAppManager.java:logAppSummary(165)) - appId=application_1376091973496_0001,name=TestLiveRegionService,user=stevel,queue=default,state=FAILED,trackingUrl=stevel-2.local:57699/cluster/app/application_1376091973496_0001,appMasterHost=stevel-2.local,startTime=1376091981554,finishTime=1376091992114 2013-08-09 16:46:32,123 [Socket Reader #1 for port 57707] INFO hadoop.ipc.Server (Server.java:saslProcess(1334)) - Auth successful for appattempt_1376091973496_0001_000001 (auth:SIMPLE) 2013-08-09 16:46:32,128 [IPC Server handler 3 on 57707] INFO nodemanager.containermanager.ContainerManagerImpl (ContainerManagerImpl.java:stopContainerInternal(581)) - Stopping container with container Id: container_1376091973496_0001_01_000001 2013-08-09 16:46:32,361 [main] INFO server.nodemanager.NodeManager (NodeManager.java:cleanupContainers(304)) - All containers in DONE state 2013-08-09 16:46:32,366 [main] INFO org.mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@stevel-2.local:0 2013-08-09 16:46:32,469 [main] INFO hadoop.ipc.Server (Server.java:stop(2429)) - Stopping server on 57707 2013-08-09 16:46:32,472 [IPC Server listener on 57707] INFO hadoop.ipc.Server (Server.java:run(720)) - Stopping IPC Server listener on 57707 2013-08-09 16:46:32,474 [IPC Server Responder] INFO hadoop.ipc.Server (Server.java:run(866)) - Stopping IPC Server Responder 2013-08-09 16:46:32,476 [main] INFO hadoop.ipc.Server (Server.java:stop(2429)) - Stopping server on 57708 2013-08-09 16:46:32,479 [IPC Server listener on 57708] INFO hadoop.ipc.Server (Server.java:run(720)) - Stopping IPC Server listener on 57708 2013-08-09 16:46:32,479 [IPC Server Responder] INFO hadoop.ipc.Server (Server.java:run(866)) - Stopping IPC Server Responder 2013-08-09 16:46:32,479 [Public Localizer] INFO containermanager.localizer.ResourceLocalizationService (ResourceLocalizationService.java:run(728)) - Public cache exiting 2013-08-09 16:46:32,485 [main] INFO org.mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@stevel-2.local:0 2013-08-09 16:46:32,587 [main] INFO hadoop.ipc.Server (Server.java:stop(2429)) - Stopping server on 57704 2013-08-09 16:46:32,589 [IPC Server listener on 57704] INFO hadoop.ipc.Server (Server.java:run(720)) - Stopping IPC Server listener on 57704 2013-08-09 16:46:32,590 [IPC Server Responder] INFO hadoop.ipc.Server (Server.java:run(866)) - Stopping IPC Server Responder 2013-08-09 16:46:32,590 [main] INFO hadoop.ipc.Server (Server.java:stop(2429)) - Stopping server on 57705 2013-08-09 16:46:32,591 [IPC Server listener on 57705] INFO hadoop.ipc.Server (Server.java:run(720)) - Stopping IPC Server listener on 57705 2013-08-09 16:46:32,591 [IPC Server Responder] INFO hadoop.ipc.Server (Server.java:run(866)) - Stopping IPC Server Responder 2013-08-09 16:46:32,591 [Public Localizer] INFO containermanager.localizer.ResourceLocalizationService (ResourceLocalizationService.java:run(728)) - Public cache exiting 2013-08-09 16:46:32,592 [main] INFO org.mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@stevel-2.local:0 2013-08-09 16:46:32,694 [Thread[Thread-14,5,main]] ERROR token.delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep interrupted 2013-08-09 16:46:32,695 [ApplicationMaster Launcher] WARN resourcemanager.amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning. 2013-08-09 16:46:32,695 [main] INFO hadoop.ipc.Server (Server.java:stop(2429)) - Stopping server on 57703 {code} > Invalid state transition exception caught when tearing down a (mini) cluster > ---------------------------------------------------------------------------- > > Key: YARN-1054 > URL: https://issues.apache.org/jira/browse/YARN-1054 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.1.1-beta > Reporter: Steve Loughran > Priority: Minor > > When I'm tearing down a MiniYARNCluster I get a stack trace warning that an > invalid state transition has been attempted > {code} > [CONTAINER_KILLED_ON_REQUEST]org.apache.hadoop.yarn.state.InvalidStateTransitonException: > Invalid event: CONTAINER_KILLED_ON_REQUEST > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira