[ https://issues.apache.org/jira/browse/YARN-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743551#comment-16743551 ]
Wilfred Spiegelenburg commented on YARN-9194: --------------------------------------------- Changing it back to SCHEDULED looks much better. I have one question left on this one: the change made does a check after we have already stored the container as the master container on the attempt. This means that after we have done that although we have no container any call to {{SchedulerApplicationAttempt.isWaitingForAMContainer()}} will still return that we have an AM container allocated. This could affect scheduling. Should the check not be done before we make any changes to the appAttempt? > Invalid event: REGISTERED at FAILED, and NullPointerException happens in RM > while shutdown a NM > ----------------------------------------------------------------------------------------------- > > Key: YARN-9194 > URL: https://issues.apache.org/jira/browse/YARN-9194 > Project: Hadoop YARN > Issue Type: Bug > Reporter: lujie > Assignee: lujie > Priority: Critical > Attachments: YARN-9194_1.patch, YARN-9194_2.patch, YARN-9194_3.patch, > YARN-9194_4.patch, hadoop-hires-resourcemanager-hadoop11.log > > > While the attempt fails, the REGISTERED comes, hence the > InvalidStateTransitionException happens. > > {code:java} > 2019-01-13 00:41:57,127 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > App attempt: appattempt_1547311267249_0001_000002 can't handle this event at > current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > REGISTERED at FAILED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:913) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:1073) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:1054) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org