[
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310417#comment-16310417
]
Jason Lowe commented on YARN-7663:
----------------------------------
Thanks for updating the patch!
The unit test passes even without the fix, so it does not work well as a
regression test. The problem is that the test needs some way to recognize an
invalid transition occurred. Currently it simply logs a message which is not
detectable by a unit test. One way to work around this is to have RMAppImpl
call a new, empty protected method, e.g.: onInvalidStateTransition, that the
unit test can override to detect when invalid transitions occur.
Speaking of invalid transitions, there's another one happening in the unit
test. In the test's output:
{noformat}
2018-01-03 15:41:13,044 ERROR [Thread-96] rmapp.RMAppImpl
(RMAppImpl.java:handle(886)) - App: application_1515015672238_0022 can't handle
this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event:
APP_UPDATE_SAVED at KILLED
at
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
at
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:884)
at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:119)
at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.sendAppUpdateSavedEvent(TestRMAppTransitions.java:492)
at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppStartAfterKilled(TestRMAppTransitions.java:1168)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{noformat}
When an app is killed in the NEW state it skips storing it to the state store
and simply moves it to the killed state. That's why we get an invalid state
transition when we try to send a state store event -- the state store should
not be involved anymore once we're in the KILLED state. Actually that's
probably a bug -- I suspect apps that are killed from the NEW state don't end
up getting recovered on RM restart. That could be fixed as part of this JIRA,
but it may be better to defer that to another JIRA.
It would be good to cleanup the whitespace nits identified by checkstyle as
well.
> RMAppImpl:Invalid event: START at KILLED
> ----------------------------------------
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.8.0
> Reporter: lujie
> Assignee: lujie
> Priority: Minor
> Labels: patch
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event:
> START at KILLED
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will
> deterministically reproduce.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]