[
https://issues.apache.org/jira/browse/YARN-212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495622#comment-13495622
]
Nathan Roberts commented on YARN-212:
-------------------------------------
The interesting parts of the logs are:
2012-11-07 05:36:33,754 [AsyncDispatcher event handler] INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Application application_1351873505780_75229 transitioned from NEW to INITING
2012-11-07 05:36:33,754 [AsyncDispatcher event handler] INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Adding container_1351873505780_75229_01_000004 to application
application_1351873505780_75229
2012-11-07 05:36:33,760 [Node Status Updater] INFO
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out
status for container: container_id {, app_attempt_id {, application_id {, id:
75229, cluster_timestamp: 1351873505780, }, attemptId: 1, }, id: 4, }, state:
C_RUNNING, diagnostics: "", exit_status: -1000,
2012-11-07 05:36:33,774 [AsyncDispatcher event handler] INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1351873505780_75229_01_000004 transitioned from NEW to DONE
2012-11-07 05:36:33,774 [AsyncDispatcher event handler] WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event:
APPLICATION_CONTAINER_FINISHED at INITING
at
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
at
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:404)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:60)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:570)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:562)
at
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:619)
2012-11-07 05:36:33,774 [AsyncDispatcher event handler] INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Application application_1351873505780_75229 transitioned from INITING to null
2012-11-07 05:36:33,775 [AsyncDispatcher event handler] INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
Considering container container_1351873505780_75229_01_000004 for
log-aggregation
2012-11-07 05:36:33,775 [AsyncDispatcher event handler] INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Application application_1351873505780_75229 transitioned from INITING to
RUNNING
2012-11-07 05:36:33,775 [AsyncDispatcher event handler] WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Can't handle this event at current state: Current: [DONE], eventType:
[INIT_CONTAINER]
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event:
INIT_CONTAINER at DONE
at
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
at
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:826)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:71)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:554)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:547)
at
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:619)
2012-11-07 05:36:33,775 [AsyncDispatcher event handler] INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1351873505780_75229_01_000004 transitioned from DONE to null
Fix should be to allow for the CONTAINER_DONE_TRANSITION processing to occur
from the INITTING state. This should remove the container from the list of
containers the application is tracking so that it finishes cleaning up when the
application actually finishes. As it stands the application is going to think
this container is still running and will continue renewing log aggregation
releases for ever.
> NM state machine ignores an APPLICATION_CONTAINER_FINISHED event when it
> shouldn't
> ----------------------------------------------------------------------------------
>
> Key: YARN-212
> URL: https://issues.apache.org/jira/browse/YARN-212
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 0.23.4
> Reporter: Nathan Roberts
> Assignee: Nathan Roberts
> Priority: Blocker
>
> The NM state machines can make the following two invalid state transitions
> when a speculative attempt is killed shortly after it gets started. When this
> happens the NM keeps the log aggregation context open for this application
> and therefore chews up FDs and leases on the NN, eventually running the NN
> out of FDs and bringing down the entire cluster.
> 2012-11-07 05:36:33,774 [AsyncDispatcher event handler] WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event:
> APPLICATION_CONTAINER_FINISHED at INITING
> 2012-11-07 05:36:33,775 [AsyncDispatcher event handler] WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Can't handle this event at current state: Current: [DONE], eventType:
> [INIT_CONTAINER]
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event:
> INIT_CONTAINER at DONE
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira