[
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16266987#comment-16266987
]
lujie commented on YARN-7563:
-----------------------------
I have find the reason by analysis code and logs
!YARN-7536.png!
above figure has shown the reason:client submit a application and then send
kill command. NM will start Container by ContainerManagerImpl
.startContainerInternal, this method will (1)put appID in context and then
(4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need
to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS
event to ContainerManagerImpl. ContainerManagerImpl will first (2)check the
appID if exists in context, if it dose, (3) send FINISH_APPLICATION.
This bug manifests needing two condition: (1) happens before(2) and (3)
happens before(4). one of them is violated, this bug will be hidden.
I need to future check the ApplicationImpl code, make sure whether
AppFinishTriggeredTransition needed to fix this bug.
> Invalid event: FINISH_APPLICATION at NEW
> ----------------------------------------
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Affects Versions: 3.0.0-beta1
> Reporter: lujie
> Attachments: YARN-7536.png
>
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
> couldn't find container container_1511608703018_0001_01_000001 while
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event:
> FINISH_APPLICATION at NEW
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
> Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]