P.S. that was 1 second in a cluster. over WAN I believe the benefit of the current simple implementation will be bigger.
On Mon, Oct 13, 2014 at 9:06 AM, Alexander Shraer <[email protected]> wrote: > I agree that such a feature could be very useful. > > >one could announce to the other nodes that the leader is retiring, so > there’s no need to wait for failed heartbeat responses to realize that the > leader is no longer serving. > > This is actually what happens when the leader steps down during a reconfig > operation (such as when changing the leading port, removing the leader or > making it an observer), so it should be possible to add an explicit command > to trigger this mechanism as you suggest, if someone wants to take on this > implementation. > > It saved about 1 second in my experiments (which is probably the timeout > you mention and a few rounds of fast leader election) , but can still be > optimized further. For example, for simplicity I still go back to leader > election, with an initial vote indicating who the new designated leader > should be, so even though leader election terminates after one round it > is not completely avoided as it could be. > > > > > > On Mon, Oct 13, 2014 at 8:25 AM, Jeff Potter < > [email protected]> wrote: > >> >> We’re using zookeeper cross-DC to coordinate communication of data that’s >> served to our iOS app via HTTP API calls — in this case, the hosts that the >> app should be connecting to for chat. Chat nodes get added into the >> cluster, register themselves in zookeeper; meanwhile, clients issue API >> calls to web servers that return a list of chat nodes that the client >> should be connecting to. There’s a little bit of other global settings that >> we also coordinate via zookeeper, but that stuff could, in theory, be >> manually applied to each of the DCs, since changes to it are manual. (We >> also run cassandra cross-DC, so we already have dependencies on talking >> cross-DC; hence two main DCs and a tie-breaker third DC that also serves as >> a back-up DC.) >> >> I’ve seen SmartStack before, and it seems like a good potential solution >> at larger scales, but in our current size / capacity, registering directly >> on top of zookeeper is lightweight and simple enough. I haven’t seen the >> Pinterst writeup; thanks for sending it! >> >> You’d asked about frequency of leader elections. We don’t see leader >> elections happening that often — the only time they come up is when we do >> something to take down the current leader, which is very, very rare — our >> deploys don’t need to restart that service. So far, the only time it’s >> happened in a year+ is the XEN-108 bug that caused the node to reboot. >> >> To be clear, we’re “okay” with the leader re-election time; I’m just >> surprised that it’s as choppy as it is and we were surprised looking >> through the “service zookeeper stop” target as to how it was implemented. I >> would think there’d be some benefit to having a leader “step down”, in that >> one could announce to the other nodes that the leader is retiring, so >> there’s no need to wait for failed heartbeat responses to realize that the >> leader is no longer serving. >> >> -Jeff >> >> >> On Oct 11, 2014, at 2:09 PM, ralph tice <[email protected]> wrote: >> >> > I'm not an expert but I don't think there is a magic bullet here, leader >> > election has to happen in this circumstance and that takes time. >> > >> > You may be better served by building better resilience to eliminate >> > ZooKeeper's uptime from being a single point of failure in your services >> > layer. Pinterest and Airbnb both have some prior art here, >> > >> http://engineering.pinterest.com/post/77933733851/zookeeper-resilience-at-pinterest >> > and http://nerds.airbnb.com/smartstack-service-discovery-cloud/ >> > >> > I'm curious why you chose a cross-DC ensemble versus localized >> same-region >> > ensembles. Don't you deal with a significant frequency of leader >> elections >> > from being in 3 regions anyway? >> > >> > >> > On Sat, Oct 11, 2014 at 11:21 AM, Jeff Potter < >> > [email protected]> wrote: >> > >> >> >> >> The reason I ask is that we’ve noticed, when running zookeeper >> cross-DC, >> >> that restarting the node that’s currently the leader causes a brief but >> >> real service interruption for 3 to 5 seconds while the rest of the >> cluster >> >> elects a new leader and syncs. We’re on AWS, with 2 ZK nodes in >> US-East, 2 >> >> in US-West-2, and 1 in US-West (as a tie-breaker). >> >> >> >> It would seem taking a leader to follower status would be useful; and >> >> doing so without it actually being a stop / disconnect on all clients >> >> connect to the node. (Especially for doing rolling restarts of all >> nodes, >> >> e.g. XEN-108 bug.) >> >> >> >> -Jeff >> >> >> >> >> >> >> >> On Oct 10, 2014, at 10:16 AM, Ivan Kelly <[email protected]> wrote: >> >> >> >>> Or just pause the process until someone else takes over. >> >>> >> >>> 1. kill -STOP <zookeeper_pid> >> >>> 2. // wait for election to happen >> >>> 3. kill -CONT <zookeeper_pid> >> >>> >> >>> This wont top it from becoming leader again. Also, client may migrate >> to >> >>> other servers. >> >>> >> >>> -Ivan >> >>> >> >>> Alexander Shraer writes: >> >>> >> >>>> Hi, >> >>>> >> >>>> I don't think there's a direct way, although this seems a useful >> thing >> >> to >> >>>> add. >> >>>> >> >>>> One think you could do is to issue a reconfig changing the leader's >> >>>> leading/quorum port (through which >> >>>> it talks with the followers). This will cause it to give up >> leadership >> >>>> while keeping it in the cluster. >> >>>> >> >>>> Cheers, >> >>>> Alex >> >>>> >> >>>> On Fri, Oct 10, 2014 at 5:57 AM, Jeff Potter < >> >>>> [email protected]> wrote: >> >>>> >> >>>>> >> >>>>> Hi, >> >>>>> >> >>>>> Is there a way to “retire” a leader while keeping it in the cluster? >> >>>>> >> >>>>> Thanks, >> >>>>> Jeff >> >> >> >> >> >> >
