Dear Curator(s), A couple of days ago we did some maintenance of our Zookeeper ensemble and did a rolling restart of each node. Restarting the followers worked like a charm. However, restarting leader started throwing/logging CuratorConnectionLossException exceptions that trickled down to our application code until a reelection had occured. Example:
https://gist.github.com/JensRantil/309fa1bf17ee2982b8e7 We were hoping that Curator would gracefully retry until a leader had been reelected, but I'm sure there is something we need to tweak for this to avoid happening again. *Question:* To avoid this to happen in the future, should we simply increase our retry policy to retry longer before giving up? Additional information: - Zookeeper version 1.4.5 - Curator version 2.7.0 - We are currently using the following retrying policy: new ExponentialBackoffRetry(1000, 3); - Zookeeper configuration all default except initLimit=60 and syncLimit=30. Thanks, Jens -- Jens Rantil Backend engineer Tink AB Email: [email protected] Phone: +46 708 84 18 32 Web: www.tink.se Facebook <https://www.facebook.com/#!/tink.se> Linkedin <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> Twitter <https://twitter.com/tink>
