[ 
https://issues.apache.org/jira/browse/YARN-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2416:
----------------------------------
    Attachment: YARN-2416.001.patch

Assumptions for this test. Design is to simply allow REGISTERED before and 
LAUNCHED events arrive out of order.
Considerations
 - Launched transition populates the launch am end time, calculates and 
publishes the AM launch delay metric, updates the diagnostics (overwritten by 
the REGISTERED event), and starts the AM liveliness monitor. This patch calls 
the AM launched transition manually from the registered event. This prevents 
duplication of the code, but does fundamentally separate the code from the 
Registration transition. Another way to do this would be to calculate the 
cluster metrics AM Register Delay as 0, and when the LAUNCHED event comes in to 
not update the diagnostics if we are in the running state.

> InvalidStateTransitonException in ResourceManager if AMLauncher does not 
> receive response for startContainers() call in time
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-2416
>                 URL: https://issues.apache.org/jira/browse/YARN-2416
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Jian Fang
>            Assignee: Jonathan Eagles
>            Priority: Critical
>         Attachments: YARN-2416.001.patch
>
>
> AMLauncher calls startContainers(allRequests) to launch a container for 
> application master. Normally, the call comes back immediately so that the 
> RMAppAttempt changes its state from ALLOCATED to LAUNCHED. 
> However, we do observed that in some cases, the RPC call came back very late 
> but the AM container was already started. Because the RMAppAttempt stuck in 
> ALLOCATED state, once resource manager received the REGISTERED event from the 
> application master, it threw InvalidStateTransitonException as follows.
> 2014-07-05 08:59:05,021 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> REGISTERED at ALLOCATED
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:652)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:752)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>         at java.lang.Thread.run(Thread.java:744)
> For subsequent STATUS_UPDATE and CONTAINER_ALLOCATED events for this job, 
> resource manager kept throwing InvalidStateTransitonException.
> 2014-07-05 08:59:06,152 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> STATUS_UPDATE at ALLOCATED
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:652)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:752)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>         at java.lang.Thread.run(Thread.java:744)
> 2014-07-05 08:59:07,779 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_1404549222428_0001_02_000002 Container Transitioned from NEW to
>  ALLOCATED
> 2014-07-05 08:59:07,779 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> CONTAINER_ALLOCATED at ALLOCATED
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:652)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:752)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>         at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to