Hi Martin, Yes, reconfig like other ZooKeeper operations works only when there's a quorum. Although you're saying that zone 1 failed, it may be the case that the link between zone 1 and zone 2 failed but the zones themselves are fine. In this case if we allow the zones to process commands, like reconfig or others, we will end up with split-brain and loose consistency.
if you're sure that zone 1 is down you could shut down the servers in zone 2, change the configuration files to exclude zone 1 and restart. Note that when you restart you should bring the servers up in an order that wouldn't allow a quorum without someone with the latest state. Otherwise you'll loose data. Example: zone 1 has participant replicas A, B, C zone 2 has participants D, E, F. Latest state is on A, B, C, D. Zone 1 fails, you restart zone 2 servers, but E and F come up first. In this case you're likely to loose latest updates. Perhaps others can suggest a better solution, but you could consider having a tie breaker replica somewhere in a third location. Or if you don't need consistency between the zones you could run 2 separate zookeepers. Does your application require consistency between zones 1 and 2 ? Alex On Wed, Sep 17, 2014 at 1:19 PM, Martin Grotzke < [email protected]> wrote: > Hi, > > is it true, that the reconfig command that's available since 3.5.0 can only > be used if there's a quorum? > > Our situation is that we have 2 datacenters (actually only 2 zones within > the same DC) which will be provisioned equally, so that we'll have an even > number of ZK nodes (true, not optimal). When 1 zone fails, there won't be a > quorum any more and ZK will be unavailable - that's my understanding. Is it > possible to add new nodes to the ZK cluster and achieve a quorum again > while the failed zone is still unavailable? > > What would you recommend how to handle this situation? > > We're using (going to use) SolrCloud as clients. > > Thanks && cheers, > Martin >
