Thanks a lot Alexander. That's a great starting point. I will look into the code.
On Wednesday, August 20, 2014, Alexander Shraer <[email protected]> wrote: > I think its: > > src/java/main/org/apache/zookeeper/server/quorum/Leader.java, > waitForEpochAck throws exception if the follower is ahead of the leader in > terms of data, like in your example > > src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, run() > throws exception if follower has a more up-to-date configuration than > leader. > > Since a leader needs support from a quorum, when trying to become leader > one of the servers who knows about d3 will need to connect to it (since d3 > was committed and every two majorities intersect). So C will not be able to > gather the required support without triggering the checks above. > > In fact C is very unlikely to get that far as to try to become the leader - > as Henry mentioned ZooKeeper has a preliminary protocol called > FastLeaderElection.java which tries to make sure that the candidate leader > has the most up-to-date data and support from a quorum. This is how the > candidate is chosen and then the other servers establish connections to > this candidate. The checks above are in case by the time connections are > established to the candidate leader some server from whom he previously > didn't hear in FastLeaderElection tries to connect and the candidate leader > discovers that he shouldn't really be the leader. Then he gives up and > returns back to FastLeaderElection. > > > > > > On Wed, Aug 20, 2014 at 10:42 AM, Gaurav Saxena <[email protected] > <javascript:;>> wrote: > > > Thanks! That's great... If someone can point me to the code where this is > > decided, it will be a great help... as I have to present evidence that > this > > scenario will not happen > > > > > > On Wed, Aug 20, 2014 at 10:33 AM, Henry Robinson <[email protected] > <javascript:;>> > > wrote: > > > > > IIRC, C cannot become the master because it does not have all the > changes > > > that A and B have seen. The leader election protocol can take care of > > > ensuring the invariant that the elected master must be the most > > up-to-date > > > of all peers. (Alternatively, the new master can request the missing > log > > > suffix from the peers during election, but I believe, although it's a > > while > > > since I checked, that ZK does the former. Someone can fill in the > > details / > > > correct me). > > > > > > Henry > > > > > > > > > On 20 August 2014 10:24, Gaurav Saxena <[email protected] > <javascript:;>> wrote: > > > > > > > I am curious about a seemingly data loss scenario. I describe it > below > > > > > > > > There are three zookeeper servers A, B, and C. > > > > 1. At one point in time t1 the state of the system is as follows: > > > > A is up and contains data d1, d2. A is master > > > > B is up and contains data d1, d2 > > > > C is up and contains data d1, d2 > > > > > > > > 2. At time t2 C goes down. The state of the system at t2 is > > > > A is up and contains data d1, d2. A is master > > > > B is up and contains data d1, d2 > > > > C is down and its log contains data d1, d2 > > > > > > > > 3. At time t3 the state of the system changes > > > > A is up and contains data d1, d2, d3. A is master > > > > B is up and contains data d1, d2, d3 > > > > C is down and its log contains data d1, d2 > > > > > > > > 4. At time t4, C comes up and also becomes the master, while A and B > > are > > > > also up > > > > > > > > Question: Because C is master, will the logs of A and B be truncated > to > > > > contain only d1 and d2? Is this considered a data loss scenario? If > > yes, > > > is > > > > there an issue around it? > > > > > > > > -- > > > > Regards > > > > Gaurav Saxena > > > > > > > > > > > > > > > > -- > > > Henry Robinson > > > Software Engineer > > > Cloudera > > > 415-994-6679 > > > > > > > > > > > -- > > Regards > > Gaurav Saxena > > > -- Regards Gaurav Saxena
