When you get SUSPENDED/LOST, you should exit your leader selector handler’s takeLeadership() method. But, there’s no reason to close the leader selector instance. Once the connection is re-established the clients will contend to be the leader again.
In Curator, as a general rule, only close objects when you are completely done with them. -Jordan > On Jul 14, 2016, at 7:38 AM, Cantrell, Curtis <[email protected]> > wrote: > > When I wrote this code over a year ago, my understanding of proper handling > of error conditions was to suspend the leaders, locks, etc.. when the > connection was SUSPENDED and to rebuild the leaders, locks, etc.. if the > connection had been LOST. I believe I have been getting Connection LOST > when the session was really still alive. When my code was then, upon > RECONNECT, created a new LeaderSelector, this was causing a new zNode to be > added (queued) to the leader path. Clearly, this is not the correct error > handling. > > Today, I am upgrading to the 3.x Curator and 3.5 zookeeper. You imply that > I should not closing the LeaderSelector on a LOST. What is the correctly > handling, assuming I am using the 3.x branch of Curator. > > Thank you, > Curtis > > From: Jordan Zimmerman [mailto:[email protected]] > Sent: Wednesday, July 13, 2016 4:26 PM > To: [email protected] > Subject: Re: Problem with LeaderSelector 2.7.1 > > I quickly looked at your code and don’t understand why you close the leader > selector on connection LOST. Does your network partition often? Also, are > you really creating a new Curator instance for every leader selector? You > should create one Curator instance for your entire application. > > -JZ > > On Jul 13, 2016, at 1:41 PM, Cantrell, Curtis <[email protected] > <mailto:[email protected]>> wrote: > > It looks like maybe there are two Fixes that affect my problem. CURATOR-264 > and CURATOR-247. Has CURATOR-247 been merge to the 2.X branch or do I > need to update my zookeeper to 3.5 in order to get the fix? > Leader election: Duplicate ephemeral nodes with same owner id > https://issues.apache.org/jira/browse/CURATOR-264 > <https://issues.apache.org/jira/browse/CURATOR-264> > > We sometimes experience failure in our leader-election functionality when we > have network issues. When this situation occurs we see that there are two > ephemeral nodes in the zookeeper cluster for the same session but there is no > active leader. > Extend Curator's connection state to support SESSION_LOST > https://issues.apache.org/jira/browse/CURATOR-247 > <https://issues.apache.org/jira/browse/CURATOR-247> > > Curator has a connection state for LOST that confuses users. It does not mean > that the session is lost. Instead it means that the retry policy has given up > retrying > > > The information contained in this message is proprietary and/or confidential. > If you are not the intended recipient, please: (i) delete the message and all > copies; (ii) do not disclose, distribute or use the message in any manner; > and (iii) notify the sender immediately. In addition, please be aware that > any message addressed to our domain is subject to archiving and review by > persons other than the intended recipient. Thank you.
