Thanks Alex for the detailed explanations-- it really helps to fill in my understanding of the implementation left open by the papers/presentations I've read (without having to read the code yet :-) ). #2 is what I was unsure of, but makes perfect sense.
Obviously committing the new configuration to the internal database is a prerequisite to committing on a server, but is writing the new *configuration file* to disk also a prerequisite for committing the new configuration? I'm curious about this so I can match it with my observations, since reading the configuration file is much easier than getting the database state. ~Jared On Sat, Jul 28, 2012 at 11:02 AM, Alexander Shraer <[email protected]>wrote: > Hi Jared, > > figuring out what happened and how to recover is part of the > reconfiguration protocol. I don't think that this is something you as a > user should do, unless I missunderstand what you're trying to do. This > should be handled by ZooKeeper just like it handles other failures without > admin intervention. > > In your scenario, D-F come up and one of them is elected leader (since you > said they know about the commit), so they start running the new config > normally. When A-C come up, several things may happen: > > 1. During the preliminary FastLeaderElection, A-C will try to connect to D > and E, and in fact they'll also try to connect with the new config members > that they know was proposed. So most chances are that someone in the new > config will send them the new config file and they'll store it and act > accordingly (connect as non-voting followers in the new config). To make > this happen, I changed FastLeaderElection to talk with proposed configs (if > known) and to piggiback the last active config you know of on all messages. > > 2. Its possible that somehow A-C complete FastLeaderElection without > talking to D-F. But since a reconfiguration was committed, it was acked by > a quorum of the old config (and a quorum of the new one). Therefore, > whoever is "elected" in the old config, knows about the reconfig proposal > (this is guaranteed by normal ZooKeeper leader recovery). Before doing > anything else, the new leader among A-C will try to complete the > reconfiguration, which involves getting enough acks from a quorum of the > new config. But in your scenario the servers in the new config will not > connect to it because they moved on, so the candidate-leader will just give > up and go back to (1) above. > > 3. In the remote chance that someone who heard about the reconfig commit > connects to a candidate-leader who didn't hear about it, the first thing it > does is to tell that candidate-leader that its not up to date, and the > leader just updates its config file, gives up on being a leader and returns > to (1). This was done by changing the first message that a > follower/observer sends to a leader it is connecting to, even before the > synchronization starts. > > Alex > > > > On Sat, Jul 28, 2012 at 8:43 AM, Jared Cantwell <[email protected] > > wrote: > >> So I'm working through some failure scenarios and I want to make sure I >> fully understand the way that dynamic membership changes previous behavior, >> so are my expectations correct in this situation: >> >> As in my previous example, lets say that the current membership of voting >> participants is {A,B,C,D,E} and we're looking to change membership to >> {D,E,F,G,H}. >> 1. Reconfiguration to {D,E,F,G,H} completes internally >> 2. D-F update their local configuration files, but A-C do not yet. >> 3. Power loss to all nodes >> >> Now what happens if A,B, and C come up with configuration files that >> still say {A,B,C,D,E}, but no other servers start up yet? Can A,B and C >> form a quorum and elect a leader since they all agree on the same state? >> What then happens when the new membership of D-H starts up? >> >> We're trying to automatically handle node failures during reconfiguration >> situations, but it seems like without being able to query all nodes to make >> sure you know of the latest membership list there is no safe way to do >> this. I'm wondering if only doing single node additions/removals would >> create less complicated failure scenarios. What are your thoughts and best >> practices around this? >> >> Thanks! >> Jared >> >> On Fri, Jul 27, 2012 at 8:57 PM, Jared Cantwell <[email protected] >> > wrote: >> >>> We are trying to remove the need for all admin intervention so that is >>> one failure scenario that is interesting to us. >>> >>> Jared >>> >>> >>> On Jul 27, 2012, at 7:42 PM, Alexander Shraer <[email protected]> wrote: >>> >>> Yes, this entry will be deleted. I don't like this either - if a new >>> follower reboots before added to the config it will not be able to boot up >>> without manual help from an admin. That's why I'm considering maybe to >>> remove the check that a participant must always initially be in its own >>> config, but for now its there. >>> >>> Alex >>> >>> On Fri, Jul 27, 2012 at 6:34 PM, Jared Cantwell < >>> [email protected]> wrote: >>> >>>> Sorry for the confusion in terminology, I was unfamiliar with the exact >>>> leader/follower semantics previously. >>>> >>>> So if all connected servers update their config file, does that mean >>>> that non-voting followers who aren't part of the new ensemble will lose the >>>> entry specific to them in their config file? I can test this myself, but >>>> getting an inside perspective is very helpful. >>>> >>>> Thanks again for the help! >>>> Jared >>>> >>>> >>>> On Jul 27, 2012, at 6:55 PM, Alexander Shraer <[email protected]> >>>> wrote: >>>> >>>> Yes, any number of followers which are not in the configuration can >>>> just connect and listen in. This has always been the case, also in 3.4, I >>>> just made use of this for the purpose of adding members during >>>> reconfiguration. Moreover, in 3.4 there this bug >>>> ZOOKEEPER-1113<https://issues.apache.org/jira/browse/ZOOKEEPER-1113> >>>> where the leader actually counts the votes of anyone connected, >>>> regardless of config membership :) This is fixed in ZK-107, so they are >>>> really non-voting followers. >>>> >>>> > I am assuming that's the case, and that it is a follower (and not >>>> > participant) by virtue of not being in the official configuration >>>> stored in >>>> > zookeeper itself. >>>> >>>> Follower and participant types of servers is not something that was >>>> defined in ZK-107. In ZooKeeper every follower/leader is a "participant". >>>> Its just that the votes of participants that are not in the configuration >>>> are not counted that's why we call them non-voting followers. BTW, >>>> obviously a non-voting follower can not become leader (like ZK-1113 this >>>> was also not enforced before ZK-107). >>>> >>>> > And a followup... does zookeeper only overwrite the dynamic >>>> > configuration file for nodes that are voting participants? Such that >>>> if I >>>> > started a follower and then left it running through some >>>> > reconfigurations, its file would not get updated if it was never >>>> added as >>>> > part of those reconfigurations? >>>> >>>> No, as soon as it connects to the current leader, its dynamic config >>>> file is overwritten with the current configuration as part of the >>>> synchronization with the leader. Every time a new configuration is >>>> committed, all connected servers (voting, non-voting, observers) will >>>> update their dynamic config file, doesn't matter if they're in the config. >>>> >>>> Alex >>>> >>>> On Fri, Jul 27, 2012 at 5:35 PM, Jared Cantwell < >>>> [email protected]> wrote: >>>> >>>>> So does just having the server started and pointing to the existing >>>>> ensemble automatically make it a "non participating follower"? In other >>>>> words, there is no need to inform the existing nodes that this new node is >>>>> joining as a follower? And to extend that, there could be any number of >>>>> followers that are simply listening in on the event stream? I am assuming >>>>> that's the case, and that it is a follower (and not participant) by virtue >>>>> of not being in the official configuration stored in zookeeper itself. >>>>> >>>>> On Fri, Jul 27, 2012 at 6:29 PM, Alexander Shraer >>>>> <[email protected]>wrote: >>>>> >>>>>> there are just two supported types - participant and observer. >>>>>> (participant can act as either follower or leader). >>>>>> >>>>>> So you can either write participant or leave it unspecified (which >>>>>> means participant by default). Also, since the ip is the same for all >>>>>> your >>>>>> ports you don't have to write it twice. All of these should work in the >>>>>> same way: >>>>>> >>>>>> server.5=10.10.5.17:2182:2183:participant;10.10.5.17:2181 >>>>>> server.5=10.10.5.17:2182:2183:participant;2181<http://10.10.5.17:2181/> >>>>>> server.5=10.10.5.17:2182:2183;10.10.5.17:2181 >>>>>> server.5=10.10.5.17:2182:2183;2181 <http://10.10.5.17:2181/> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Jul 27, 2012 at 5:25 PM, Jared Cantwell < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Thanks Alex for the response. Our current lines in the >>>>>>> configuration look like this: >>>>>>> >>>>>>> server.5=10.10.5.17:2182:2183:participant;10.10.5.17:2181 >>>>>>> >>>>>>> For the new servers is it ok for their entry to have "participant"? >>>>>>> Or should that be something different (e.g. "follower")? >>>>>>> >>>>>>> ~Jared >>>>>>> >>>>>>> On Fri, Jul 27, 2012 at 6:20 PM, Alexander Shraer <[email protected] >>>>>>> > wrote: >>>>>>> >>>>>>>> Hi Jared, >>>>>>>> >>>>>>>> Thanks for experimenting with this feature. >>>>>>>> >>>>>>>> The idea is that new servers join as "non voting followers". Which >>>>>>>> means that they act as normal followers but the leader ignores their >>>>>>>> votes >>>>>>>> since they are not part of the current configuration. The leader only >>>>>>>> counts their votes during the reconfiguration itself (to make sure a >>>>>>>> quorum >>>>>>>> of the new config is ready before the new config can be >>>>>>>> committed/activated). Defining them as observers is not a good idea, >>>>>>>> for >>>>>>>> example in your scenario if they were observers they wouldn't be able >>>>>>>> to >>>>>>>> participate in the reconfiguration protocol (which is similar to the >>>>>>>> protocol for committing any other operation in which observers don't >>>>>>>> participate) and since we don't have a quorum of followers in the new >>>>>>>> config that can ack, reconfiguration would throw an exception (of >>>>>>>> KeeperException.NEWCONFIGNOQUORUM type). >>>>>>>> Of course if you intend them to be observers in the new config you >>>>>>>> can define them as observers since their votes are not needed during >>>>>>>> reconfig anyway. >>>>>>>> >>>>>>>> You're right, the new servers must be able to connect to the old >>>>>>>> quorum. At minimum, their file should contain the current leader, but >>>>>>>> you can also copy the current configuration file to the new members >>>>>>>> if you wish. >>>>>>>> >>>>>>>> In addition, you should add a line for the member itself, so that >>>>>>>> server F appears in F's config file (Its not important that the other >>>>>>>> new >>>>>>>> servers appear in F's file, but it won't hurt either, so you can do a >>>>>>>> union >>>>>>>> of old and new if you wish). The constructor of QuorumPeer checks that >>>>>>>> the >>>>>>>> server itself is in the configuration its started with, otherwise its >>>>>>>> not >>>>>>>> going to run. This check has always been there, but I'm thinking of >>>>>>>> possibly changing it in the future. >>>>>>>> >>>>>>>> As soon as F connects to the leader, its config file will be >>>>>>>> overwritten with the current config file as part of the synchronization >>>>>>>> process. >>>>>>>> >>>>>>>> Alex >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jul 27, 2012 at 10:06 AM, Jared Cantwell < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> We are testing integration with 3.5.0 and dynamic membership and I >>>>>>>>> have a >>>>>>>>> question. If I have a current set of servers in my ensemble >>>>>>>>> {A,B,C,D,E} >>>>>>>>> and I want to reconfigure the ensemble to {D,E,F,G,H}, how should >>>>>>>>> the >>>>>>>>> dynamic config file on servers F,G,H be configured on startup? >>>>>>>>> Should they >>>>>>>>> have the old ensemble, the new ensemble, or the union of both >>>>>>>>> ensembles? >>>>>>>>> It seems like these new servers need to know about the old >>>>>>>>> quorum, but >>>>>>>>> since they aren't part of it yet its not clear to me how they >>>>>>>>> should be >>>>>>>>> configured. Should there be an intermediate configuration with >>>>>>>>> F,G, and H >>>>>>>>> as simply Observers? >>>>>>>>> >>>>>>>>> I can't find much documentation on this so I want to make sure I >>>>>>>>> understand >>>>>>>>> things correctly. >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>>> ~Jared >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
