This is basically what I've seen.. I setup a test with 3 VMs and then used iptables to block/unblock traffic between the VMs.. it seems that the state after a split is highly dependent of the state before the split..
Is there a document somewhere that describes all of the conditions required to elect a leader? eg what needs to be connected to what.. couldn't find this on the web.. thanks, steve On Fri, May 23, 2014 at 12:52 PM, Camille Fournier <[email protected]>wrote: > Well, if A can't talk to C but B can talk to both, it kind of depends on > what the state was before the partition, and then what happens after the > partition. > If the leader is in A, all of the members of C will go into disconnected > state, but may also try to become leader since they can talk to B. You > might see some weird thrashing of election state etc. > If the leader is in B you might be fine but honestly I've never tested that > so far as I can recall. Really, if one site loses contact with one or more > others, you probably just want to kill all the connections in that site > until connectivity comes back. > Best thing to do if faced with this question is to actually run a test that > simulates it since these things always have a ton of nuance; it is unlikely > that you will lose any data (the basic rules of the protocol account for > this fairly well), but the performance might degrade in an unexpected way. > I think it could happen, in a very bad case, where quorum is made with A to > B, then flips to C to B due to network whatever, and data gets truncated. > I would put pretty aggressive monitoring around this if I were implementing > such a situation and kill one of the partitions if it happened, given the > byzantine nature of the edge cases. > > C > > > On Wed, May 21, 2014 at 11:36 PM, Steven Bower <[email protected] > >wrote: > > > I am contemplating setting up a zookeeper ensemble across multiple > > facilities. I know the docs warn against multi-facility emsembles, but > for > > the sake of discussion can we assume that all are connected with the same > > reliability/performance you'd expect if they were all in the same LAN. > > > > Imagine a ensemble with three facilities (A, B and C). Within each > facility > > there are 3 instances of zookeeper. So total 9 members of the ensemble > > which gives us quorum at 5 instances. All facilities are connected with > > point-to-point connections between each other (by point-to-point i'm > > implying that if the connection between A and C went down that A could > not > > talk to C via B). > > > > > > With this environment what behaviors would I see if for example the link > > between A and B went down? > > > > Any other recommendations? > > > > thanks, > > > > steve > > >
