Eric Badger created YARN-7114:
---------------------------------
Summary: NM can fail during shutdown with log aggregation
Key: YARN-7114
URL: https://issues.apache.org/jira/browse/YARN-7114
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 2.8.1
Reporter: Eric Badger
{noformat}
2017-08-24 16:36:35,961 [AsyncDispatcher event handler] WARN
application.ApplicationImpl: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event:
APPLICATION_LOG_HANDLING_FINISHED at FINISHING_CONTAINERS_WAIT
at
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:458)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:63)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1314)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1306)
at
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
2017-08-24 16:36:35,962 [AsyncDispatcher event handler] INFO
application.ApplicationImpl: Application application_1502220952225_46598
transitioned from FINISHING_CONTAINERS_WAIT to null
2017-08-24 16:36:36,056 [AsyncDispatcher event handler] WARN
application.ApplicationImpl: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event:
APPLICATION_LOG_HANDLING_FINISHED at FINISHING_CONTAINERS_WAIT
at
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:458)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:63)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1314)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1306)
at
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
{noformat}
This was caused by doing an RM restart that increased its version. The NM's
version was unchanged and so it was kicked out of the cluster during
registration. The NM then did log aggregation and failed when it finished,
since log aggregation was never called for (it was forced by the shutdown). The
failure was seen in 2.8, but I believe that this problem also exists in 2.9 and
trunk
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]