This can be emulated on Linux by simply pausing the process. The correct behavior is that the old leader will freeze and if it comes back relatively soon, it will still be recognized as leader.
If the pause is long enough, then the other members of the quorum will decide that they have lost contact with the leader and initiate a new leader election. That election will cause the epoch to be incremented. When the old leader returns, it may attempt to commit a change. Such a commit will be rejected due to an old epoch. Alternately, it will get a ping or a commit from the other servers and realize that it is behind and initiate a resynchronization. Even if the old leader had started a commit before being paused, the commit will have either succeeded in becoming durable or not. Neither case will cause any discrepancies since the leader election will cause the remaining quorum to agree on a correct state. In any case, the paused server should either survive as leader with the assent of a quorum or it should realize it is no longer the leader and transparently update itself to the current state of the quorum. On Wed, Mar 7, 2012 at 9:48 AM, Scott Lindner <[email protected]>wrote: > ... > This got us to wondering what would happen if the elected leader were > "frozen" in this manner? There's no guarantees where in the code it would > be hung to know for certain what would happen when it left this state, but > could there be any problems where the "frozen" server would come out of > this state still thinking it was the leader (since it was stuck) when in > fact another server had been elected in the meantime? I would imagine this > should resolve itself fairly quickly but is there still a possibility that > this could lead to bad behavior? Typically if a server fails I would > imagine the zookeeper instance would die or lose leadership because of an > event (failed connection, etc) but this seems slightly different since the > code would be blocked in a random state. > ...
