[ 
https://issues.apache.org/jira/browse/YARN-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735545#comment-13735545
 ] 

Steve Loughran commented on YARN-1054:
--------------------------------------

logs with stack
{code}
2013-08-09 16:46:31,208 [main] INFO  yarn.cluster.YarnMiniClusterTestBase 
(YarnMiniClusterTestBase.groovy:describe(121)) - 
2013-08-09 16:46:31,208 [main] INFO  yarn.cluster.YarnMiniClusterTestBase 
(YarnMiniClusterTestBase.groovy:describe(122)) - ===============================
2013-08-09 16:46:31,208 [main] INFO  yarn.cluster.YarnMiniClusterTestBase 
(YarnMiniClusterTestBase.groovy:describe(123)) - teardown
2013-08-09 16:46:31,208 [main] INFO  yarn.cluster.YarnMiniClusterTestBase 
(YarnMiniClusterTestBase.groovy:describe(124)) - ===============================
2013-08-09 16:46:31,208 [main] INFO  yarn.cluster.YarnMiniClusterTestBase 
(YarnMiniClusterTestBase.groovy:describe(125)) - 
2013-08-09 16:46:31,216 [main] INFO  org.mortbay.log (Slf4jLog.java:info(67)) - 
Stopped SelectChannelConnector@stevel-2.local:0
2013-08-09 16:46:31,319 [main] INFO  hadoop.ipc.Server (Server.java:stop(2429)) 
- Stopping server on 57710
2013-08-09 16:46:31,324 [IPC Server listener on 57710] INFO  hadoop.ipc.Server 
(Server.java:run(720)) - Stopping IPC Server listener on 57710
2013-08-09 16:46:31,326 [IPC Server Responder] INFO  hadoop.ipc.Server 
(Server.java:run(866)) - Stopping IPC Server Responder
2013-08-09 16:46:31,334 [main] INFO  hadoop.ipc.Server (Server.java:stop(2429)) 
- Stopping server on 57711
2013-08-09 16:46:31,336 [IPC Server listener on 57711] INFO  hadoop.ipc.Server 
(Server.java:run(720)) - Stopping IPC Server listener on 57711
2013-08-09 16:46:31,336 [IPC Server Responder] INFO  hadoop.ipc.Server 
(Server.java:run(866)) - Stopping IPC Server Responder
2013-08-09 16:46:31,339 [Public Localizer] INFO  
containermanager.localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:run(728)) - Public cache exiting
2013-08-09 16:46:31,340 [main] INFO  server.nodemanager.NodeManager 
(NodeManager.java:cleanupContainers(261)) - Containers still running on 
SHUTDOWN : [container_1376091973496_0001_01_000001, 
container_1376091973496_0001_01_000002]
2013-08-09 16:46:31,342 [main] INFO  server.nodemanager.NodeManager 
(NodeManager.java:cleanupContainers(270)) - Waiting for containers to be killed
2013-08-09 16:46:31,343 [AsyncDispatcher event handler] INFO  
containermanager.container.Container (ContainerImpl.java:handle(860)) - 
Container container_1376091973496_0001_01_000001 transitioned from RUNNING to 
KILLING
2013-08-09 16:46:31,343 [AsyncDispatcher event handler] INFO  
containermanager.container.Container (ContainerImpl.java:handle(860)) - 
Container container_1376091973496_0001_01_000002 transitioned from RUNNING to 
KILLING
2013-08-09 16:46:31,344 [AsyncDispatcher event handler] INFO  
containermanager.launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(323)) - Cleaning up container 
container_1376091973496_0001_01_000001
2013-08-09 16:46:31,386 [ContainersLauncher #0] WARN  
server.nodemanager.DefaultContainerExecutor 
(DefaultContainerExecutor.java:launchContainer(207)) - Exit code from container 
container_1376091973496_0001_01_000001 is : 143
2013-08-09 16:46:31,403 [AsyncDispatcher event handler] INFO  
containermanager.launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(323)) - Cleaning up container 
container_1376091973496_0001_01_000002
2013-08-09 16:46:31,441 [ContainersLauncher #1] WARN  
server.nodemanager.DefaultContainerExecutor 
(DefaultContainerExecutor.java:launchContainer(207)) - Exit code from container 
container_1376091973496_0001_01_000002 is : 143
2013-08-09 16:46:31,456 [AsyncDispatcher event handler] INFO  
containermanager.container.Container (ContainerImpl.java:handle(860)) - 
Container container_1376091973496_0001_01_000001 transitioned from KILLING to 
CONTAINER_CLEANEDUP_AFTER_KILL
2013-08-09 16:46:31,457 [AsyncDispatcher event handler] WARN  
containermanager.container.Container (ContainerImpl.java:handle(856)) - Can't 
handle this event at current state: Current: [CONTAINER_CLEANEDUP_AFTER_KILL], 
eventType: [CONTAINER_KILLED_ON_REQUEST]
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:853)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:73)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:692)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:685)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
        at java.lang.Thread.run(Thread.java:680)
2013-08-09 16:46:31,457 [AsyncDispatcher event handler] INFO  
containermanager.container.Container (ContainerImpl.java:handle(860)) - 
Container container_1376091973496_0001_01_000001 transitioned from 
CONTAINER_CLEANEDUP_AFTER_KILL to null
2013-08-09 16:46:31,457 [AsyncDispatcher event handler] INFO  
containermanager.container.Container (ContainerImpl.java:handle(860)) - 
Container container_1376091973496_0001_01_000002 transitioned from KILLING to 
CONTAINER_CLEANEDUP_AFTER_KILL
2013-08-09 16:46:31,458 [AsyncDispatcher event handler] WARN  
containermanager.container.Container (ContainerImpl.java:handle(856)) - Can't 
handle this event at current state: Current: [CONTAINER_CLEANEDUP_AFTER_KILL], 
eventType: [CONTAINER_KILLED_ON_REQUEST]
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:853)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:73)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:692)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:685)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
        at java.lang.Thread.run(Thread.java:680)
2013-08-09 16:46:31,458 [AsyncDispatcher event handler] INFO  
containermanager.container.Container (ContainerImpl.java:handle(860)) - 
Container container_1376091973496_0001_01_000002 transitioned from 
CONTAINER_CLEANEDUP_AFTER_KILL to null
2013-08-09 16:46:31,460 [DeletionService #1] INFO  
server.nodemanager.DefaultContainerExecutor 
(DefaultContainerExecutor.java:deleteAsUser(368)) - Deleting absolute path : 
/Users/stevel/Projects/Hortonworks/Projects/hoya/target/TestLiveRegionService/TestLiveRegionService-localDir-nm-1_0/usercache/stevel/appcache/application_1376091973496_0001/container_1376091973496_0001_01_000001
2013-08-09 16:46:31,461 [DeletionService #2] INFO  
server.nodemanager.DefaultContainerExecutor 
(DefaultContainerExecutor.java:deleteAsUser(368)) - Deleting absolute path : 
/Users/stevel/Projects/Hortonworks/Projects/hoya/target/TestLiveRegionService/TestLiveRegionService-localDir-nm-1_0/usercache/stevel/appcache/application_1376091973496_0001/container_1376091973496_0001_01_000002
2013-08-09 16:46:31,461 [AsyncDispatcher event handler] INFO  
server.nodemanager.NMAuditLogger (NMAuditLogger.java:logSuccess(89)) - 
USER=stevel        OPERATION=Container Finished - Killed   TARGET=ContainerImpl 
   RESULT=SUCCESS  APPID=application_1376091973496_0001    
CONTAINERID=container_1376091973496_0001_01_000001
2013-08-09 16:46:31,465 [AsyncDispatcher event handler] INFO  
containermanager.container.Container (ContainerImpl.java:handle(860)) - 
Container container_1376091973496_0001_01_000001 transitioned from 
CONTAINER_CLEANEDUP_AFTER_KILL to DONE
2013-08-09 16:46:31,466 [AsyncDispatcher event handler] INFO  
server.nodemanager.NMAuditLogger (NMAuditLogger.java:logSuccess(89)) - 
USER=stevel        OPERATION=Container Finished - Killed   TARGET=ContainerImpl 
   RESULT=SUCCESS  APPID=application_1376091973496_0001    
CONTAINERID=container_1376091973496_0001_01_000002
2013-08-09 16:46:31,466 [AsyncDispatcher event handler] INFO  
containermanager.container.Container (ContainerImpl.java:handle(860)) - 
Container container_1376091973496_0001_01_000002 transitioned from 
CONTAINER_CLEANEDUP_AFTER_KILL to DONE
2013-08-09 16:46:31,467 [AsyncDispatcher event handler] INFO  
containermanager.application.Application (ApplicationImpl.java:transition(320)) 
- Removing container_1376091973496_0001_01_000001 from application 
application_1376091973496_0001
2013-08-09 16:46:31,467 [AsyncDispatcher event handler] INFO  
containermanager.application.Application (ApplicationImpl.java:transition(320)) 
- Removing container_1376091973496_0001_01_000002 from application 
application_1376091973496_0001
2013-08-09 16:46:32,106 [ResourceManager Event Processor] INFO  
resourcemanager.rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(220)) 
- container_1376091973496_0001_01_000001 Container Transitioned from RUNNING to 
COMPLETED
2013-08-09 16:46:32,112 [AsyncDispatcher event handler] INFO  
server.resourcemanager.ApplicationMasterService 
(ApplicationMasterService.java:unregisterAttempt(546)) - Unregistering app 
attempt : appattempt_1376091973496_0001_000001
2013-08-09 16:46:32,112 [ResourceManager Event Processor] INFO  
common.fica.FiCaSchedulerApp (FiCaSchedulerApp.java:containerCompleted(246)) - 
Completed container: container_1376091973496_0001_01_000001 in state: COMPLETED 
event:FINISHED
2013-08-09 16:46:32,113 [ResourceManager Event Processor] INFO  
server.resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(98)) - 
USER=stevel  OPERATION=AM Released ContaineTARGET=SchedulerApp       
RESULT=SUCCESS  APPID=application_1376091973496_0001    
CONTAINERID=container_1376091973496_0001_01_000001
2013-08-09 16:46:32,113 [ResourceManager Event Processor] INFO  
common.fica.FiCaSchedulerNode (FiCaSchedulerNode.java:releaseContainer(149)) - 
Released container container_1376091973496_0001_01_000001 of capacity 
<memory:64, vCores:1> on host 10.11.3.237:57707, which currently has 1 
containers, <memory:256, vCores:1> used and <memory:3840, vCores:7> available, 
release resources=true
2013-08-09 16:46:32,113 [ResourceManager Event Processor] INFO  
scheduler.fifo.FifoScheduler (FifoScheduler.java:containerCompleted(761)) - 
Application appattempt_1376091973496_0001_000001 released container 
container_1376091973496_0001_01_000001 on node: host: 10.11.3.237:57707 
#containers=1 available=3840 used=256 with event: FINISHED
2013-08-09 16:46:32,114 [AsyncDispatcher event handler] INFO  
rmapp.attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(607)) - 
appattempt_1376091973496_0001_000001 State change from RUNNING to FAILED
2013-08-09 16:46:32,114 [ResourceManager Event Processor] INFO  
resourcemanager.rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(220)) 
- container_1376091973496_0001_01_000002 Container Transitioned from RUNNING to 
COMPLETED
2013-08-09 16:46:32,114 [AsyncDispatcher event handler] INFO  
resourcemanager.rmapp.RMAppImpl (RMAppImpl.java:transition(778)) - Application 
application_1376091973496_0001 failed 1 times due to AM Container for 
appattempt_1376091973496_0001_000001 exited with  exitCode: 143 due to: 
Container Killed on Shutdown
Container killed on request. Exit code is 143
.Failing this attempt.. Failing the application.
2013-08-09 16:46:32,114 [ResourceManager Event Processor] INFO  
common.fica.FiCaSchedulerApp (FiCaSchedulerApp.java:containerCompleted(246)) - 
Completed container: container_1376091973496_0001_01_000002 in state: COMPLETED 
event:FINISHED
2013-08-09 16:46:32,114 [ResourceManager Event Processor] INFO  
server.resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(98)) - 
USER=stevel  OPERATION=AM Released ContaineTARGET=SchedulerApp       
RESULT=SUCCESS  APPID=application_1376091973496_0001    
CONTAINERID=container_1376091973496_0001_01_000002
2013-08-09 16:46:32,114 [ResourceManager Event Processor] INFO  
common.fica.FiCaSchedulerNode (FiCaSchedulerNode.java:releaseContainer(149)) - 
Released container container_1376091973496_0001_01_000002 of capacity 
<memory:256, vCores:1> on host 10.11.3.237:57707, which currently has 0 
containers, <memory:0, vCores:0> used and <memory:4096, vCores:8> available, 
release resources=true
2013-08-09 16:46:32,115 [ResourceManager Event Processor] INFO  
scheduler.fifo.FifoScheduler (FifoScheduler.java:containerCompleted(761)) - 
Application appattempt_1376091973496_0001_000001 released container 
container_1376091973496_0001_01_000002 on node: host: 10.11.3.237:57707 
#containers=0 available=4096 used=0 with event: FINISHED
2013-08-09 16:46:32,115 [AsyncDispatcher event handler] INFO  
resourcemanager.rmapp.RMAppImpl (RMAppImpl.java:handle(588)) - 
application_1376091973496_0001 State change from RUNNING to FAILED
2013-08-09 16:46:32,116 [pool-1-thread-2] INFO  
resourcemanager.amlauncher.AMLauncher (AMLauncher.java:run(262)) - Cleaning 
master appattempt_1376091973496_0001_000001
2013-08-09 16:46:32,116 [ResourceManager Event Processor] INFO  
resourcemanager.scheduler.AppSchedulingInfo 
(AppSchedulingInfo.java:clearRequests(108)) - Application 
application_1376091973496_0001 requests cleared
2013-08-09 16:46:32,116 [AsyncDispatcher event handler] WARN  
server.resourcemanager.RMAuditLogger (RMAuditLogger.java:logFailure(255)) - 
USER=stevel   OPERATION=Application Finished - Failed TARGET=RMAppManager     
RESULT=FAILURE  DESCRIPTION=App failed with state: FAILED       
PERMISSIONS=Application application_1376091973496_0001 failed 1 times due to AM 
Container for appattempt_1376091973496_0001_000001 exited with  exitCode: 143 
due to: Container Killed on Shutdown
Container killed on request. Exit code is 143
.Failing this attempt.. Failing the application.        
APPID=application_1376091973496_0001
2013-08-09 16:46:32,117 [AsyncDispatcher event handler] INFO  
resourcemanager.recovery.RMStateStore (RMStateStore.java:handleStoreEvent(467)) 
- Removing info for app: application_1376091973496_0001
2013-08-09 16:46:32,118 [AsyncDispatcher event handler] INFO  
server.resourcemanager.RMAppManager$ApplicationSummary 
(RMAppManager.java:logAppSummary(165)) - 
appId=application_1376091973496_0001,name=TestLiveRegionService,user=stevel,queue=default,state=FAILED,trackingUrl=stevel-2.local:57699/cluster/app/application_1376091973496_0001,appMasterHost=stevel-2.local,startTime=1376091981554,finishTime=1376091992114
2013-08-09 16:46:32,123 [Socket Reader #1 for port 57707] INFO  
hadoop.ipc.Server (Server.java:saslProcess(1334)) - Auth successful for 
appattempt_1376091973496_0001_000001 (auth:SIMPLE)
2013-08-09 16:46:32,128 [IPC Server handler 3 on 57707] INFO  
nodemanager.containermanager.ContainerManagerImpl 
(ContainerManagerImpl.java:stopContainerInternal(581)) - Stopping container 
with container Id: container_1376091973496_0001_01_000001
2013-08-09 16:46:32,361 [main] INFO  server.nodemanager.NodeManager 
(NodeManager.java:cleanupContainers(304)) - All containers in DONE state
2013-08-09 16:46:32,366 [main] INFO  org.mortbay.log (Slf4jLog.java:info(67)) - 
Stopped SelectChannelConnector@stevel-2.local:0
2013-08-09 16:46:32,469 [main] INFO  hadoop.ipc.Server (Server.java:stop(2429)) 
- Stopping server on 57707
2013-08-09 16:46:32,472 [IPC Server listener on 57707] INFO  hadoop.ipc.Server 
(Server.java:run(720)) - Stopping IPC Server listener on 57707
2013-08-09 16:46:32,474 [IPC Server Responder] INFO  hadoop.ipc.Server 
(Server.java:run(866)) - Stopping IPC Server Responder
2013-08-09 16:46:32,476 [main] INFO  hadoop.ipc.Server (Server.java:stop(2429)) 
- Stopping server on 57708
2013-08-09 16:46:32,479 [IPC Server listener on 57708] INFO  hadoop.ipc.Server 
(Server.java:run(720)) - Stopping IPC Server listener on 57708
2013-08-09 16:46:32,479 [IPC Server Responder] INFO  hadoop.ipc.Server 
(Server.java:run(866)) - Stopping IPC Server Responder
2013-08-09 16:46:32,479 [Public Localizer] INFO  
containermanager.localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:run(728)) - Public cache exiting
2013-08-09 16:46:32,485 [main] INFO  org.mortbay.log (Slf4jLog.java:info(67)) - 
Stopped SelectChannelConnector@stevel-2.local:0
2013-08-09 16:46:32,587 [main] INFO  hadoop.ipc.Server (Server.java:stop(2429)) 
- Stopping server on 57704
2013-08-09 16:46:32,589 [IPC Server listener on 57704] INFO  hadoop.ipc.Server 
(Server.java:run(720)) - Stopping IPC Server listener on 57704
2013-08-09 16:46:32,590 [IPC Server Responder] INFO  hadoop.ipc.Server 
(Server.java:run(866)) - Stopping IPC Server Responder
2013-08-09 16:46:32,590 [main] INFO  hadoop.ipc.Server (Server.java:stop(2429)) 
- Stopping server on 57705
2013-08-09 16:46:32,591 [IPC Server listener on 57705] INFO  hadoop.ipc.Server 
(Server.java:run(720)) - Stopping IPC Server listener on 57705
2013-08-09 16:46:32,591 [IPC Server Responder] INFO  hadoop.ipc.Server 
(Server.java:run(866)) - Stopping IPC Server Responder
2013-08-09 16:46:32,591 [Public Localizer] INFO  
containermanager.localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:run(728)) - Public cache exiting
2013-08-09 16:46:32,592 [main] INFO  org.mortbay.log (Slf4jLog.java:info(67)) - 
Stopped SelectChannelConnector@stevel-2.local:0
2013-08-09 16:46:32,694 [Thread[Thread-14,5,main]] ERROR 
token.delegation.AbstractDelegationTokenSecretManager 
(AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion 
recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep 
interrupted
2013-08-09 16:46:32,695 [ApplicationMaster Launcher] WARN  
resourcemanager.amlauncher.ApplicationMasterLauncher 
(ApplicationMasterLauncher.java:run(98)) - 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
 interrupted. Returning.
2013-08-09 16:46:32,695 [main] INFO  hadoop.ipc.Server (Server.java:stop(2429)) 
- Stopping server on 57703

{code}
                
> Invalid state transition exception caught when tearing down a (mini) cluster
> ----------------------------------------------------------------------------
>
>                 Key: YARN-1054
>                 URL: https://issues.apache.org/jira/browse/YARN-1054
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.1.1-beta
>            Reporter: Steve Loughran
>            Priority: Minor
>
> When I'm tearing down a MiniYARNCluster I get a stack trace warning that an 
> invalid state transition has been attempted
> {code}
> [CONTAINER_KILLED_ON_REQUEST]org.apache.hadoop.yarn.state.InvalidStateTransitonException:
>  Invalid event: CONTAINER_KILLED_ON_REQUEST
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to