[ 
https://issues.apache.org/jira/browse/YARN-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13830328#comment-13830328
 ] 

Karthik Kambatla commented on YARN-1430:
----------------------------------------

bq. But as of today, we are treating them inconsistently. An invalid event to 
the scheduler crashes the RM but an invalid event in RMNode isn't. We need to 
be consistent.
I think it is reasonable to be inconsistent here. The rationale being we should 
crash the RM only if there is absolutely no go: only some 
InvalidStateTransitions (e.g. in scheduler) affect everything on the cluster, 
others are specific to a node or an app. For localized damage, crashing the RM 
seems too aggressive. I agree we should bubble up these to the UI.

> InvalidStateTransition exceptions are ignored in state machines
> ---------------------------------------------------------------
>
>                 Key: YARN-1430
>                 URL: https://issues.apache.org/jira/browse/YARN-1430
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Omkar Vinit Joshi
>            Assignee: Omkar Vinit Joshi
>
> We have all state machines ignoring InvalidStateTransitions. These exceptions 
> will get logged but will not crash the RM / NM. We definitely should crash it 
> as they move the system into some invalid / unacceptable state.
> * Places where we hide this exception :-
> ** JobImpl
> ** TaskAttemptImpl
> ** TaskImpl
> ** NMClientAsyncImpl
> ** ApplicationImpl
> ** ContainerImpl
> ** LocalizedResource
> ** RMAppAttemptImpl
> ** RMAppImpl
> ** RMContainerImpl
> ** RMNodeImpl
> thoughts?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to