[
https://issues.apache.org/jira/browse/YARN-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13830328#comment-13830328
]
Karthik Kambatla commented on YARN-1430:
----------------------------------------
bq. But as of today, we are treating them inconsistently. An invalid event to
the scheduler crashes the RM but an invalid event in RMNode isn't. We need to
be consistent.
I think it is reasonable to be inconsistent here. The rationale being we should
crash the RM only if there is absolutely no go: only some
InvalidStateTransitions (e.g. in scheduler) affect everything on the cluster,
others are specific to a node or an app. For localized damage, crashing the RM
seems too aggressive. I agree we should bubble up these to the UI.
> InvalidStateTransition exceptions are ignored in state machines
> ---------------------------------------------------------------
>
> Key: YARN-1430
> URL: https://issues.apache.org/jira/browse/YARN-1430
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Omkar Vinit Joshi
> Assignee: Omkar Vinit Joshi
>
> We have all state machines ignoring InvalidStateTransitions. These exceptions
> will get logged but will not crash the RM / NM. We definitely should crash it
> as they move the system into some invalid / unacceptable state.
> * Places where we hide this exception :-
> ** JobImpl
> ** TaskAttemptImpl
> ** TaskImpl
> ** NMClientAsyncImpl
> ** ApplicationImpl
> ** ContainerImpl
> ** LocalizedResource
> ** RMAppAttemptImpl
> ** RMAppImpl
> ** RMContainerImpl
> ** RMNodeImpl
> thoughts?
--
This message was sent by Atlassian JIRA
(v6.1#6144)