Karthik Kambatla created YARN-2063:
--------------------------------------
Summary: ZKRMStateStore: Better handling of operation failures
Key: YARN-2063
URL: https://issues.apache.org/jira/browse/YARN-2063
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
Today, when a ZK operation fails, we handle connection-loss and
operation-timeout the same way. This could definitely use some improvements:
# Add special handling for other error codes
# Connection-loss: Nullify zkClient, so a new connection is established
# Operation-timeout: Retry a few times with exponential delay?
--
This message was sent by Atlassian JIRA
(v6.2#6252)