[
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhihai xu updated YARN-3242:
----------------------------
Attachment: YARN-3242.000.patch
> Old ZK client session watcher event messed up new ZK client session due to
> ZooKeeper asynchronously closing client session.
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.6.0
> Reporter: zhihai xu
> Assignee: zhihai xu
> Priority: Critical
> Attachments: YARN-3242.000.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to
> ZKRMStateStore when the old ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher
> event is from current session. So the watcher event from old ZK client
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
> LOG.info("ZKRMStateStore Session disconnected");
> oldZkClient = zkClient;
> zkClient = null;
> break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session
> because new session is already in SyncConnected state and it won't send
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException
> "Wait for ZKClient creation timed out" until RM shutdown.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)