Re: Problem with LeaderSelector 2.7.1

Jordan Zimmerman Thu, 14 Jul 2016 08:19:43 -0700

When you get SUSPENDED/LOST, you should exit your leader selector handler’s 
takeLeadership() method. But, there’s no reason to close the leader selector 
instance. Once the connection is re-established the clients will contend to be 
the leader again.


In Curator, as a general rule, only close objects when you are completely done 
with them.

-Jordan

> On Jul 14, 2016, at 7:38 AM, Cantrell, Curtis <[email protected]> 
> wrote:
> 
> When I wrote this code over a year ago, my understanding of proper handling 
> of error conditions was to suspend the leaders, locks, etc..  when the 
> connection was SUSPENDED and to rebuild the leaders, locks, etc..  if the 
> connection had been LOST.     I believe I have been getting Connection LOST 
> when the session was really still alive.  When my code was then, upon 
> RECONNECT, created a new LeaderSelector, this was causing a new zNode to be 
> added (queued) to the leader path.   Clearly, this is not the correct error 
> handling.
>  
> Today, I am upgrading to the 3.x Curator and 3.5 zookeeper.    You imply that 
> I should not closing the LeaderSelector on a LOST.  What is the correctly 
> handling, assuming I am using the 3.x branch of Curator. 
>  
> Thank you,
> Curtis
>  
> From: Jordan Zimmerman [mailto:[email protected]] 
> Sent: Wednesday, July 13, 2016 4:26 PM
> To: [email protected]
> Subject: Re: Problem with LeaderSelector 2.7.1
>  
> I quickly looked at your code and don’t understand why you close the leader 
> selector on connection LOST. Does your network partition often?  Also, are 
> you really creating a new Curator instance for every leader selector? You 
> should create one Curator instance for your entire application.
>  
> -JZ 
>  
> On Jul 13, 2016, at 1:41 PM, Cantrell, Curtis <[email protected] 
> <mailto:[email protected]>> wrote:
>  
> It looks like maybe there are two Fixes that affect my problem.  CURATOR-264 
> and CURATOR-247.      Has CURATOR-247 been merge to the 2.X branch or do I 
> need to update my zookeeper to 3.5 in order to get the fix?
> Leader election: Duplicate ephemeral nodes with same owner id
> https://issues.apache.org/jira/browse/CURATOR-264 
> <https://issues.apache.org/jira/browse/CURATOR-264>
>  
> We sometimes experience failure in our leader-election functionality when we 
> have network issues. When this situation occurs we see that there are two 
> ephemeral nodes in the zookeeper cluster for the same session but there is no 
> active leader.
> Extend Curator's connection state to support SESSION_LOST
> https://issues.apache.org/jira/browse/CURATOR-247 
> <https://issues.apache.org/jira/browse/CURATOR-247>
>  
> Curator has a connection state for LOST that confuses users. It does not mean 
> that the session is lost. Instead it means that the retry policy has given up 
> retrying
>  
>  
> The information contained in this message is proprietary and/or confidential. 
> If you are not the intended recipient, please: (i) delete the message and all 
> copies; (ii) do not disclose, distribute or use the message in any manner; 
> and (iii) notify the sender immediately. In addition, please be aware that 
> any message addressed to our domain is subject to archiving and review by 
> persons other than the intended recipient. Thank you.

Re: Problem with LeaderSelector 2.7.1

Reply via email to