[ https://issues.apache.org/jira/browse/YARN-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15806868#comment-15806868 ]
Varun Saxena commented on YARN-6068: ------------------------------------ Thanks [~djp] for raising the issue. We infact saw exact same issue in our clusters yesterday night. The changes as such look fine to me. In the patch, we have added an additional log ("Log aggregation abort for application .... due to NM restart"). I think this is not required. We already have a log printed when we call AppLogAggregatorImpl#abortLogAggregation. That should be good enough I guess. > Log aggregation get failed when NM restart even with recovery > ------------------------------------------------------------- > > Key: YARN-6068 > URL: https://issues.apache.org/jira/browse/YARN-6068 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Junping Du > Assignee: Junping Du > Priority: Critical > Attachments: YARN-6068.patch > > > The exception log is as following: > {noformat} > 2017-01-05 19:16:36,352 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:abortLogAggregation(527)) - Aborting log > aggregation for application_1483640789847_0001 > 2017-01-05 19:16:36,352 WARN logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:run(399)) - Aggregation did not complete for > application application_1483640789847_0001 > 2017-01-05 19:16:36,353 WARN application.ApplicationImpl > (ApplicationImpl.java:handle(461)) - Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > APPLICATION_LOG_HANDLING_FAILED at RUNNING > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:459) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:64) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1084) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1076) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > 2017-01-05 19:16:36,355 INFO application.ApplicationImpl > (ApplicationImpl.java:handle(464)) - Application > application_1483640789847_0001 transitioned from RUNNING to null > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org