[ https://issues.apache.org/jira/browse/YARN-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733823#comment-14733823 ]
Sunil G commented on YARN-4118: ------------------------------- Hi [~jianhe] This will be a potential one w.r.t ZK especially for RMApp and RMAppAttemt. If an error is not notified and RM is not fail-fast, there are chances that RMApp will be NEW_SAVING. So is it ok to fire a failure event directly to RMApp and RMAppAttempt if any of its Store/Update/Remove events are failed due to store exception. Such a direct error handling can mark app and appattempts into error state rather than keeping in limbo state. I would like to try this if its fine. Pls share your thoughts. > Newly submitted app maybe stuck at saving state if store operation failure is > ignored in ZKRMStateStore > ------------------------------------------------------------------------------------------------------- > > Key: YARN-4118 > URL: https://issues.apache.org/jira/browse/YARN-4118 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Jian He > Assignee: Sunil G > > In YARN-2019, we took a decision to ignore the failure and not fail the RM > when ZK is unavailable. > However, it leaves newly submitted app stuck at saving state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)