I see. This makes sense to me now. Thanks. Looking forward to this feature.
Regards, Peter On Tue, Aug 30, 2011 at 4:04 PM, Alexander Shraer <[email protected]>wrote: > Hi Peter, > > We're currently working on adding dynamic reconfiguration functionality to > Zookeeper. I hope that it will get in to the next release of ZK (after 3.4). > With this you'll just run a new zk command to add/remove any servers, change > ports, change roles (followers/observers), etc. > > Currently, membership is determined by the config file so the only way of > doing this is "rolling restart". This means that you change configuration > files and bounce the servers back. You should do it in a way that guarantees > that at any time any quorum of the servers that are up intersects with any > quorum of the old configuration (otherwise you might lose data). For > example, if you're going from (A, B, C) to (A, B, C, D, E), it is possible > that A and B have the latest data whereas C is falling behind (ZK stores > data on a quorum), so if you just change the config files of A, B, C to say > that they are part of the larger configuration then C might be elected with > the support of D and E and you might lose data. So in this case you'll have > to first add D, and later add E, this way the quorums intersect. Same thing > when removing servers. > > Alex > > > -----Original Message----- > > From: cheetah [mailto:[email protected]] > > Sent: Tuesday, August 30, 2011 3:36 PM > > To: [email protected] > > Cc: [email protected] > > Subject: Re: How zab avoid split-brain problem? > > > > Hi Alex, > > > > Thanks for the explanation. > > > > Then I have another question: > > > > If there are 7 machines in my current zookeeper clusters, two of them > > are > > failed. How can I reconfigure the Zookeeper to make it working with 5 > > machines? i.e if the master can get 3 machines' reply, it can commit > > the > > transaction. > > > > On the other hand, if I add 2 machines to make a 9 node Zookeeper > > cluster, > > how can I configure it to make it taking advantages of 9 machines? > > > > This is more related to user mailing list. So I cc to it. > > > > Thanks, > > Peter > > > > On Tue, Aug 30, 2011 at 12:21 PM, Alexander Shraer <shralex@yahoo- > > inc.com>wrote: > > > > > Hi Peter, > > > > > > It's the second option. The servers don't know if the leader failed > > or > > > was partitioned from them. So each group of 3 servers in your > > scenario > > > can't distinguish the situation from another scenario where none of > > the > > > servers > > > failed but these 3 servers are partitioned from the other 4. To > > prevent a > > > split brain > > > in an asynchronous network a leader must have the support of a > > quorum. > > > > > > Alex > > > > > > > -----Original Message----- > > > > From: cheetah [mailto:[email protected]] > > > > Sent: Tuesday, August 30, 2011 12:23 AM > > > > To: [email protected] > > > > Subject: How zab avoid split-brain problem? > > > > > > > > Hi folks, > > > > I am reading the zab paper, but a bit confusing how zab handle > > > > split > > > > brain problem. > > > > Suppose there are A, B, C, D, E, F and G seven servers, now A > > is > > > > the > > > > leader. When A dies and at the same time, B,C,D are isolated from > > E, F > > > > and > > > > G. > > > > In this case, will Zab continue working like this: if B>C>D > > and > > > > E>F>G, > > > > so the two groups are both voting and electing B and E as their > > leaders > > > > separately. Thus, there is a split brain problem. > > > > Or Zookeeper just stop working, because there were original 7 > > > > servers, > > > > after 1 failure, a new leader still expects to have a quorum of 3 > > > > servers > > > > voting for it as the leader. And because the two groups are > > separate > > > > from > > > > each other, no leader can be elected out. > > > > > > > > If it is the first case, Zookeeper will have a split brain > > > > problem, > > > > which probably is not the case. But in the second case, a 7-node > > > > Zookeeper > > > > service can only handle a node failure and a network partition > > failure. > > > > > > > > Am I understanding wrongly? Looking forward to your insights. > > > > > > > > Thanks, > > > > Peter > > > >
