[ 
https://issues.apache.org/jira/browse/YARN-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733823#comment-14733823
 ] 

Sunil G commented on YARN-4118:
-------------------------------

Hi [~jianhe]
This will be a potential one w.r.t ZK especially for RMApp and RMAppAttemt. If 
an error is not notified and RM is not fail-fast, there are chances that RMApp 
will be NEW_SAVING. So is it ok to fire a failure event directly to RMApp and 
RMAppAttempt if any of its Store/Update/Remove events are failed due to store 
exception. Such a direct error handling can mark app and appattempts into error 
state rather than keeping in limbo state. I would like to try this if its fine. 
Pls share your thoughts.

> Newly submitted app maybe stuck at saving state if store operation failure is 
> ignored in ZKRMStateStore
> -------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4118
>                 URL: https://issues.apache.org/jira/browse/YARN-4118
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Jian He
>            Assignee: Sunil G
>
> In YARN-2019, we took a decision to ignore the failure and not fail the RM 
> when ZK is unavailable.
> However, it leaves newly submitted app stuck at saving state.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to