[ https://issues.apache.org/jira/browse/YARN-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829229#comment-13829229 ]
Vinod Kumar Vavilapalli commented on YARN-1430: ----------------------------------------------- There are pros and cons to both approaches. If we completely ignore the errors, nobody knows about the problem. One solution to this is have these invalid transitions bubble up to the UI, say on RM UI, AM UI etc in wild, bold and red colors. On the other side, I agree that crashing RM all the time is going to be more and more painful in production environments. As for tests, I think we SHOULD clearly crash the tests, so that we can catch as many of these errors as quickly as possible. But as of today, we are treating them inconsistently. An invalid event to the scheduler crashes the RM but an invalid event in RMNode isn't. We need to be consistent. > InvalidStateTransition exceptions are ignored in state machines > --------------------------------------------------------------- > > Key: YARN-1430 > URL: https://issues.apache.org/jira/browse/YARN-1430 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Omkar Vinit Joshi > Assignee: Omkar Vinit Joshi > > We have all state machines ignoring InvalidStateTransitions. These exceptions > will get logged but will not crash the RM / NM. We definitely should crash it > as they move the system into some invalid / unacceptable state. > * Places where we hide this exception :- > ** JobImpl > ** TaskAttemptImpl > ** TaskImpl > ** NMClientAsyncImpl > ** ApplicationImpl > ** ContainerImpl > ** LocalizedResource > ** RMAppAttemptImpl > ** RMAppImpl > ** RMContainerImpl > ** RMNodeImpl > thoughts? -- This message was sent by Atlassian JIRA (v6.1#6144)