[ 
https://issues.apache.org/jira/browse/YARN-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15806626#comment-15806626
 ] 

Junping Du commented on YARN-6068:
----------------------------------

In YARN-4325, we add sending out aggregation failure event to get rid of app 
leak in NM state store issues. However, we forget one case that log aggregation 
could abort rather than finish when NM get restart. In this case, we shouldn't 
send aggregation failure event.

> Log aggregation get failed when NM restart even with recovery
> -------------------------------------------------------------
>
>                 Key: YARN-6068
>                 URL: https://issues.apache.org/jira/browse/YARN-6068
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>
> The exception log is as following:
> {noformat}
> 2017-01-05 19:16:36,352 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:abortLogAggregation(527)) - Aborting log 
> aggregation for application_1483640789847_0001
> 2017-01-05 19:16:36,352 WARN  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:run(399)) - Aggregation did not complete for 
> application application_1483640789847_0001
> 2017-01-05 19:16:36,353 WARN  application.ApplicationImpl 
> (ApplicationImpl.java:handle(461)) - Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FAILED at RUNNING
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:459)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:64)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1084)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1076)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
>         at java.lang.Thread.run(Thread.java:745)
> 2017-01-05 19:16:36,355 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(464)) - Application 
> application_1483640789847_0001 transitioned from RUNNING to null
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to