Thanks a lot Alex! Excellent explanation :-)
On Sat, Jun 15, 2013 at 7:01 AM, Alexander Shraer <[email protected]> wrote: > Hi German, > > During normal operation ZK guarantees that a quorum (any majority) of > the ZK ensemble has all operations that may have been committed. > > So without dynamic reconfiguration you should ensure that when you're > changing the ensemble, any possible quorum of the new ensemble > necessarily intersects with any quorum of the 'old' ensemble. > > If you add D and E right away this property may not be guaranteed, > since a new quorum (3 out of 5) can be for example C, D, E. Whereas > its possible that only A and B have the latest state, so it'll be > lost. > > To ensure this when going from 3 to 5 servers you should probably do 2 > transitions. First add > server D. Any majority here, i.e., any 3 servers out of 4, will > necessarily contain 2 servers from the original ensemble (A, B, C). > So at least one server in any new quorum actually has the latest state. > > Then add E. A quorum here is any 3 out of 5 servers, so even if some > quorum includes E and the server from (A, B, C, D) that > didn't have the latest state, we still have 1 server in the quorum > that does have the latest state, so we're fine... > > As long as you add servers one by one and wait for leader election to > complete in every stage, or preserve the quorum intersection property > in some other way, it should be safe. But with dynamic reconfig you > don't need to do that and no reboots necessary of course. > > Alex > > On Fri, Jun 14, 2013 at 9:14 PM, German Blanco > <[email protected]> wrote: > > Hello, > > > > Could you please clarify if this thread is about a rolling start in an > > ensemble without the dynamic reconfiguration support? > > And when you say "Create a 5 node ensemble", that means quorum is 5. But > > then you give server lists with only 3 servers in each node? > > If the server list has 3 servers, then quorum is actually 3 and what is > > described may happen in that scenario. > > In that case C follows B, E follows D and A follows either B or D and > there > > are two working ensembles. > > It should be possible to create problems, even with more standard > > configuration changes: > > If we want to change a quorum of three to a quorum of five {A,B,C} to > > {A,B,C,D,E}: > > - First the configuration is changed in all the nodes, but they are not > > restarted. Only A, B and C are running. > > - One of them is stopped (e.g. A). > > - At this point, if A, D and E are started with the new configuration, > they > > may elect a leader before any of them is aware of either B or C, form an > > ensemble and start serving txns. > > - However, if A is started, we wait until it connects to the leader of B > > and C, and then D and E are started and then B and C are restarted, > > everything should be ok. The fact that this depends on the human ability > to > > start D and E while A,B and C are connected to the ensemble seems a bit > > risky though. > > I have found a presentation on the topic: > > > http://www.slideshare.net/Hadoop_Summit/dynamic-reconfiguration-of-zookeeper > > > > If anybody knows of a safer way to change a quorum of 3 to a quorum of 5 > > with e.g. zookeeper 3.4.5, please point it out. > > > > Regards, > > > > Germán. > > > > > > On Fri, Jun 14, 2013 at 11:46 PM, Jordan Zimmerman < > > [email protected]> wrote: > > > >> I got the test cluster into the state described with 2 leaders. I then > >> allocated 100 Curator clients to write nodes "/n" where n is the index > >> (i.e. "/0", "/1", …). The idea that the nodes would be distributed > around > >> the cluster instances. I then allocated a single Curator instance > dedicated > >> to one of the servers instance, did a sync, and did an exists() to > verify > >> that each cluster instances had all the nodes. For the 2 leader cluster, > >> this fails. > >> > >> -JZ > >> > >> On Jun 14, 2013, at 1:54 PM, "FPJ" <[email protected]> wrote: > >> > >> > I messed up the last sentence, here is what I was trying to say: > >> > > >> > It is ok to have two servers thinking they are leaders as long as only > >> one > >> > is > >> > able to commit txns at a time by having a quorum of supporters. Each > >> server > >> > is going to follow a single leader, so I don't see a problem in your > >> > scenario > >> > with the information you provided. Now if you tell me that when you > keep > >> > sending new transactions to those leaders, both keep committing new > >> > transactions (not the same txns), then we have a problem. I don't see > how > >> > this can happen, though. > >> > > >> > Also, one of the leaders should eventually time out and go back to > leader > >> > election. > >> > > >> >> -----Original Message----- > >> >> From: FPJ [mailto:[email protected]] > >> >> Sent: 14 June 2013 21:44 > >> >> To: [email protected] > >> >> Subject: RE: Rolling config change considered harmful? > >> >> > >> >> It is ok to have two servers thinking they are leaders as long as > only > >> one > >> > is > >> >> able to commit txns at a time by having a quorum of supporters. Each > >> > server > >> >> is going to follow a single leader, so I don't see a problem in your > >> > scenario > >> >> with the information you provided. Now if you tell me that when you > keep > >> >> sending new transactions to those leaders and they keep committing > them > >> >> forever, both keep committing new transactions, then we have a > problem. > >> I > >> >> don't see how this can happen, though. > >> >> > >> >> Also, one of the leaders should eventually time out and go back to > >> leader > >> >> election. > >> >> > >> >> -Flavio > >> >> > >> >>> -----Original Message----- > >> >>> From: Jordan Zimmerman [mailto:[email protected]] > >> >>> Sent: 14 June 2013 21:10 > >> >>> To: [email protected] > >> >>> Subject: Re: Rolling config change considered harmful? > >> >>> > >> >>> More on this. > >> >>> > >> >>> I just did some testing with wholly contrived scenarios and I was > able > >> >>> to > >> >> get a > >> >>> cluster in a state where it had two leaders. NOTE: all of this was > >> >>> done > >> >> with > >> >>> Curator's TestingCluster > >> >>> > >> >>> * Create a 5 node ensemble > >> >>> * Save the list of instances, directories etc. > >> >>> * Wait for quorum > >> >>> * Shut down the cluster > >> >>> * Restart the ensemble with the same ports and directories. However, > >> >>> this time, give different server lists to each instance: > >> >>> * Instance A -> A D E > >> >>> * Instance B -> A B C > >> >>> * Instance C -> A B C > >> >>> * Instance D -> A D E > >> >>> * Instance E -> A D E > >> >>> > >> >>> There is at least one common server amongst all of them. When I > >> >>> restart > >> >> the > >> >>> cluster with this configuration I ended up with two leaders. This > >> >>> state > >> >> stays > >> >>> consistent after leader election (i.e. it doesn't try to re-elect). > >> >>> > >> >>> A: following > >> >>> B: leading > >> >>> C: following > >> >>> D: leading > >> >>> E: following > >> >>> > >> >>> This may be the correct behavior. i.e. it may be that ZooKeeper > cannot > >> >>> realistically run in this scenario. What it means to me is that > >> >>> rolling > >> >> config > >> >>> changes, if too lax, can create chaos. > >> >>> > >> >>> -Jordan > >> >>> > >> >>> On Jun 14, 2013, at 12:27 PM, "FPJ" <[email protected]> wrote: > >> >>> > >> >>>> In the case I described, the txn is not reflected in the zookeeper > >> >> state. > >> >>>> Say T is a create txn. Once C is elected, it determines the initial > >> >>>> history of txns for the new epoch that is starting and this initial > >> >>>> history is not going to include T. > >> >>>> > >> >>>> In the example below, I was ignoring the client that triggered T, > >> >>>> but since it has been acked by a quorum, the client might as well > >> >>>> have received the confirmation of the operation and think that the > >> >>>> znode has > >> >>> been created. > >> >>>> > >> >>>> -Flavio > >> >>>> > >> >>>>> -----Original Message----- > >> >>>>> From: Jordan Zimmerman [mailto:[email protected]] > >> >>>>> Sent: 14 June 2013 20:16 > >> >>>>> To: [email protected] > >> >>>>> Subject: Re: Rolling config change considered harmful? > >> >>>>> > >> >>>>> Yes - save that I'm not sure what happens with a client when a > >> >>>>> transaction > >> >>>> is > >> >>>>> lost. What is the error to the client? Or are you referring to > >> >>>>> internal transactions as part of the leader election? > >> >>>>> > >> >>>>> -JZ > >> >>>>> > >> >>>>> On Jun 14, 2013, at 12:07 PM, "FPJ" <[email protected]> > wrote: > >> >>>>> > >> >>>>>> Not sure if this helps but here is an example: > >> >>>>>> > >> >>>>>> - Txn T is acknowledged by A and B (ensemble is {A, B, C}) > >> >>>>>> - Ensemble changes to {B, C, D} > >> >>>>>> - C and D form a quorum and elect C because it has the highest > zxid. > >> >>>>>> > >> >>>>>> C won't have T, so the txn gets lost. > >> >>>>>> > >> >>>>>> Does it make sense? > >> >>>>>> > >> >>>>>> -Flavio > >> >>>> > >> >>>> > >> >> > >> > > >> > > >> > >> >
