I'm not an expert but I don't think there is a magic bullet here, leader election has to happen in this circumstance and that takes time.
You may be better served by building better resilience to eliminate ZooKeeper's uptime from being a single point of failure in your services layer. Pinterest and Airbnb both have some prior art here, http://engineering.pinterest.com/post/77933733851/zookeeper-resilience-at-pinterest and http://nerds.airbnb.com/smartstack-service-discovery-cloud/ I'm curious why you chose a cross-DC ensemble versus localized same-region ensembles. Don't you deal with a significant frequency of leader elections from being in 3 regions anyway? On Sat, Oct 11, 2014 at 11:21 AM, Jeff Potter < [email protected]> wrote: > > The reason I ask is that we’ve noticed, when running zookeeper cross-DC, > that restarting the node that’s currently the leader causes a brief but > real service interruption for 3 to 5 seconds while the rest of the cluster > elects a new leader and syncs. We’re on AWS, with 2 ZK nodes in US-East, 2 > in US-West-2, and 1 in US-West (as a tie-breaker). > > It would seem taking a leader to follower status would be useful; and > doing so without it actually being a stop / disconnect on all clients > connect to the node. (Especially for doing rolling restarts of all nodes, > e.g. XEN-108 bug.) > > -Jeff > > > > On Oct 10, 2014, at 10:16 AM, Ivan Kelly <[email protected]> wrote: > > > Or just pause the process until someone else takes over. > > > > 1. kill -STOP <zookeeper_pid> > > 2. // wait for election to happen > > 3. kill -CONT <zookeeper_pid> > > > > This wont top it from becoming leader again. Also, client may migrate to > > other servers. > > > > -Ivan > > > > Alexander Shraer writes: > > > >> Hi, > >> > >> I don't think there's a direct way, although this seems a useful thing > to > >> add. > >> > >> One think you could do is to issue a reconfig changing the leader's > >> leading/quorum port (through which > >> it talks with the followers). This will cause it to give up leadership > >> while keeping it in the cluster. > >> > >> Cheers, > >> Alex > >> > >> On Fri, Oct 10, 2014 at 5:57 AM, Jeff Potter < > >> [email protected]> wrote: > >> > >>> > >>> Hi, > >>> > >>> Is there a way to “retire” a leader while keeping it in the cluster? > >>> > >>> Thanks, > >>> Jeff > >
