[jira] [Commented] (YARN-9194) Invalid event: REGISTERED at FAILED, and NullPointerException happens in RM while shutdown a NM

Wilfred Spiegelenburg (JIRA) Tue, 15 Jan 2019 18:00:50 -0800


    [ 
https://issues.apache.org/jira/browse/YARN-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743551#comment-16743551
 ]


Wilfred Spiegelenburg commented on YARN-9194:
---------------------------------------------

Changing it back to SCHEDULED looks much better. I have one question left on 
this one: the change made does a check after we have already stored the 
container as the master container on the attempt. This means that after we have 
done that although we have no container any call to 
{{SchedulerApplicationAttempt.isWaitingForAMContainer()}} will still return 
that we have an AM container allocated. This could affect scheduling.

Should the check not be done before we make any changes to the appAttempt?


> Invalid event: REGISTERED at FAILED, and NullPointerException happens in RM 
> while shutdown a NM
> -----------------------------------------------------------------------------------------------
>
>                 Key: YARN-9194
>                 URL: https://issues.apache.org/jira/browse/YARN-9194
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: lujie
>            Assignee: lujie
>            Priority: Critical
>         Attachments: YARN-9194_1.patch, YARN-9194_2.patch, YARN-9194_3.patch, 
> YARN-9194_4.patch, hadoop-hires-resourcemanager-hadoop11.log
>
>
> While the attempt fails, the REGISTERED comes, hence the 
> InvalidStateTransitionException happens.
>  
> {code:java}
> 2019-01-13 00:41:57,127 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> App attempt: appattempt_1547311267249_0001_000002 can't handle this event at 
> current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> REGISTERED at FAILED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:913)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:1073)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:1054)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9194) Invalid event: REGISTERED at FAILED, and NullPointerException happens in RM while shutdown a NM

Reply via email to