[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037664#comment-14037664
 ] 

Junping Du commented on YARN-2019:
----------------------------------

[~kasha], sorry that I ignored your comments as my email/company changed during 
that time. My thought on right behave is:
If any issue in ZK cluster side, although it is distributed and should be more 
robust but could down due to bug or bad configuration, we can let ActiveRM 
continue to run as no-HA case. In addition, we should report Admin that the HA 
is not playing well, and let admin to decide when it is the proper timeline to 
bring down RM and reconfigure the HA things. Make sense?

> Retrospect on decision of making RM crashed if any exception throw in 
> ZKRMStateStore
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-2019
>                 URL: https://issues.apache.org/jira/browse/YARN-2019
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Junping Du
>            Priority: Critical
>              Labels: ha
>         Attachments: YARN-2019.1-wip.patch
>
>
> Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
> exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
> internal bug itself, but not fatal exception. We should retrospect some 
> decision here as HA feature is designed to protect key component but not 
> disturb it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to