[
https://issues.apache.org/jira/browse/YARN-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karthik Kambatla updated YARN-1934:
-----------------------------------
Attachment: yarn-1934-0.patch
Here is a patch that replaces all direct references to zkClient and uses
runWithCheck instead.
ZKRMStateStore has become very unwieldy and hard to manage. We should
definitely clean it up. Thinking about it, I think we should just have have
RMZooKeeper and/or RMFencingZooKeeper classes that extend/wrap ZooKeeper and
override the methods we need. That would make the remaining ZK code much easier
to read and maintain. Would like to work on this on a separate JIRA target for
2.5.
Haven't added a test in the patch. I am tempted not to, given the intention to
cleanup/revamp the ZK interactions. Can add one if insisted.
> Potential NPE in ZKRMStateStore caused by handling Disconnected event from ZK.
> ------------------------------------------------------------------------------
>
> Key: YARN-1934
> URL: https://issues.apache.org/jira/browse/YARN-1934
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.4.0
> Reporter: Rohith
> Assignee: Karthik Kambatla
> Priority: Blocker
> Attachments: RM.txt, yarn-1934-0.patch
>
>
> For ZK disconnected event , zkClient is set to null. It is very much prone to
> throw NPE.
> {noformat}
> case Disconnected:
> LOG.info("ZKRMStateStore Session disconnected");
> oldZkClient = zkClient;
> zkClient = null;
> break;
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)