[ 
https://issues.apache.org/jira/browse/YARN-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15806868#comment-15806868
 ] 

Varun Saxena commented on YARN-6068:
------------------------------------

Thanks [~djp] for raising the issue. We infact saw exact same issue in our 
clusters yesterday night.
The changes as such look fine to me.
In the patch, we have added an additional log ("Log aggregation abort for 
application .... due to NM restart"). I think this is not required. We already 
have a log printed when we call AppLogAggregatorImpl#abortLogAggregation. That 
should be good enough I guess.

> Log aggregation get failed when NM restart even with recovery
> -------------------------------------------------------------
>
>                 Key: YARN-6068
>                 URL: https://issues.apache.org/jira/browse/YARN-6068
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>         Attachments: YARN-6068.patch
>
>
> The exception log is as following:
> {noformat}
> 2017-01-05 19:16:36,352 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:abortLogAggregation(527)) - Aborting log 
> aggregation for application_1483640789847_0001
> 2017-01-05 19:16:36,352 WARN  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:run(399)) - Aggregation did not complete for 
> application application_1483640789847_0001
> 2017-01-05 19:16:36,353 WARN  application.ApplicationImpl 
> (ApplicationImpl.java:handle(461)) - Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FAILED at RUNNING
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:459)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:64)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1084)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1076)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
>         at java.lang.Thread.run(Thread.java:745)
> 2017-01-05 19:16:36,355 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(464)) - Application 
> application_1483640789847_0001 transitioned from RUNNING to null
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to