[ https://issues.apache.org/jira/browse/YARN-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13830328#comment-13830328 ]
Karthik Kambatla commented on YARN-1430: ---------------------------------------- bq. But as of today, we are treating them inconsistently. An invalid event to the scheduler crashes the RM but an invalid event in RMNode isn't. We need to be consistent. I think it is reasonable to be inconsistent here. The rationale being we should crash the RM only if there is absolutely no go: only some InvalidStateTransitions (e.g. in scheduler) affect everything on the cluster, others are specific to a node or an app. For localized damage, crashing the RM seems too aggressive. I agree we should bubble up these to the UI. > InvalidStateTransition exceptions are ignored in state machines > --------------------------------------------------------------- > > Key: YARN-1430 > URL: https://issues.apache.org/jira/browse/YARN-1430 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Omkar Vinit Joshi > Assignee: Omkar Vinit Joshi > > We have all state machines ignoring InvalidStateTransitions. These exceptions > will get logged but will not crash the RM / NM. We definitely should crash it > as they move the system into some invalid / unacceptable state. > * Places where we hide this exception :- > ** JobImpl > ** TaskAttemptImpl > ** TaskImpl > ** NMClientAsyncImpl > ** ApplicationImpl > ** ContainerImpl > ** LocalizedResource > ** RMAppAttemptImpl > ** RMAppImpl > ** RMContainerImpl > ** RMNodeImpl > thoughts? -- This message was sent by Atlassian JIRA (v6.1#6144)