One of the required functions of Zookeeper is that network partition should not result in inconsistent results. If you have 2 servers, then partition leaves a symmetrical situation that you have to break in order to have a reasonable way to continue.
Since you can't, in general, distinguish between partition and node failure you can't have a 2 node system that can maintain consistency correctly and continue operation in the event of either node failing. ZK reflects this situation. You can configure your ZK cluster so that you can handle network partition or one failed node, but you cannot handle partition and allow either node to fail. To configure you cluster that way, set up the second node as an observer. The observer can fail and ZK will continue operations, but if you lose the non-observer then ZK will freeze. Likewise, in the presence of partition nodes connected to the observer and unable to connect to the non-observer will not be able to get ZK service. In contrast, nodes connected to the non-observer will be able to continue operations. See here for more information: http://zookeeper.apache.org/doc/trunk/zookeeperObservers.html On Thu, Feb 2, 2012 at 4:19 PM, Alan Perez-Rathke <[email protected]> wrote: > Okay thanks. > > Just curious, is it currently possible to have replicated ZooKeeper > servers that will still function if only one of the servers is online? > > -- > Alan > > > On Thursday, February 2, 2012 at 6:09 PM, Henry Robinson wrote: > > > Yes, that's right. More precisely, a quorum of peers (hence the name ;)) > must participate in an election round for it to terminate. > > > > In your case, with a two node cluster, a quorum is usually n/2 + 1 > nodes, which is 2. So both nodes must be available. > > > > Henry > > > > Sent from my iPad > > > > On Feb 2, 2012, at 3:58 PM, Alan Perez-Rathke <[email protected](mailto: > [email protected])> wrote: > > > > > Hello, > > > > > > I am encountering a scenario in which I have started two QuorumPeers > and then I call shutdown() on one of them (in order to simulate a server > crashing). > > > > > > The remaining QuorumPeer then busy loops within > Election::lookForLeader(). It never appears to be able to return from this > loop. > > > > > > I have seen this behavior on ZooKeeper versions 3.3.4 and 3.4.2 and > with election algorithms: LeaderElection, AuthFastLeaderElection, and > FastLeaderElection. > > > > > > Does this mean that QuorumPeer requires at least 2 peers online in > order to not infinite loop within the internal leader election? > > > > > > Thanks,-- > > > Alan > > > > > > > > > > > > > >
