[ 
https://issues.apache.org/jira/browse/YARN-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829229#comment-13829229
 ] 

Vinod Kumar Vavilapalli commented on YARN-1430:
-----------------------------------------------

There are pros and cons to both approaches.

If we completely ignore the errors, nobody knows about the problem. One 
solution to this is have these invalid transitions bubble up to the UI, say on 
RM UI, AM UI etc in wild, bold and red colors.

On the other side, I agree that crashing RM all the time is going to be more 
and more painful in production environments.

As for tests, I think we SHOULD clearly crash the tests, so that we can catch 
as many of these errors as quickly as possible.

But as of today, we are treating them inconsistently. An invalid event to the 
scheduler crashes the RM but an invalid event in RMNode isn't. We need to be 
consistent.

> InvalidStateTransition exceptions are ignored in state machines
> ---------------------------------------------------------------
>
>                 Key: YARN-1430
>                 URL: https://issues.apache.org/jira/browse/YARN-1430
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Omkar Vinit Joshi
>            Assignee: Omkar Vinit Joshi
>
> We have all state machines ignoring InvalidStateTransitions. These exceptions 
> will get logged but will not crash the RM / NM. We definitely should crash it 
> as they move the system into some invalid / unacceptable state.
> * Places where we hide this exception :-
> ** JobImpl
> ** TaskAttemptImpl
> ** TaskImpl
> ** NMClientAsyncImpl
> ** ApplicationImpl
> ** ContainerImpl
> ** LocalizedResource
> ** RMAppAttemptImpl
> ** RMAppImpl
> ** RMContainerImpl
> ** RMNodeImpl
> thoughts?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to