Generally things like heartbeating or estimating expiration of ephemeral nodes requires reference to a clock. As such, detection of network partition or estimation of when an ephemeral node vanishes can go arbitrarily wrong.
On Tue, Nov 26, 2013 at 4:57 PM, Cameron McKenzie <[email protected]>wrote: > Excuse my ignorance (I'm relatively new to ZK), but how does the accuracy > of the clock affect this situation? > > > On Wed, Nov 27, 2013 at 11:53 AM, Ted Dunning <[email protected]> > wrote: > > > This is not necessarily true. The old master may not have an accurate > > clock. > > > > The ascending id idea that Alex mentions is a very nice way to put more > > guarantees on the process. > > > > > > > > On Tue, Nov 26, 2013 at 2:58 PM, Alexander Shraer <[email protected]> > > wrote: > > > > > Cameron's solution basically relies on additional timing assumptions > > > as Maciej mentions in his question. > > > > > > One more thing you could do is to implement increasing generation ids > > > for masters, and have clients in your system reject commands from a > > > master if they already know that a master with a higher generation id > > > was elected (either because they saw a command from the new master or > > > because they got a notification from ZK). This way each client can > > > only have a single master and goes forward in time. > > > > > > Alex > > > > > > On Tue, Nov 26, 2013 at 2:34 PM, Cameron McKenzie > > > <[email protected]> wrote: > > > > If I'm understanding your question correctly, you're worried that > when > > > the > > > > current 'master' loses its connection to ZooKeeper, a new 'master' > will > > > be > > > > elected and you will have 2 'master' nodes at the same time. As soon > as > > > you > > > > lose a connection to ZooKeeper there are no guarantees about any of > the > > > > state that you're determining from it. When you lose the ZooKeeper > > > > connection, your 'master' must assume that it is no longer a 'master' > > > node > > > > until it reconnects to ZooKeeper, at which point it will be able to > > work > > > > out what's going on. > > > > > > > > If you look at Apache Curator, its implementation of the Leader latch > > > > recipe handles this loss of connection and reestablishment. > > > > > > > > cheers > > > > Cam > > > > > > > > > > > > On Wed, Nov 27, 2013 at 9:28 AM, ms209495 <[email protected]> wrote: > > > > > > > >> Thanks for the reply. I want to clarify one thing. > > > >> I think about a System of 20 nodes, that uses ZooKeeper of 3 nodes. > > > >> I think about master election among these 20 nodes, that do not run > > > >> consensus, but they use zookeeper service for master election. > > > >> I used 'leader' term for a leeder in Zookeeper (among 3 nodes), and > > > >> 'master' > > > >> term for master in the System (20 nodes). > > > >> Solution is described here: > > > >> > http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection(I > > > >> would name it 'master' election, not 'leader' election), but I doubt > > if > > > it > > > >> works reliable without additional timing assumptions as I described > in > > > my > > > >> previous post. > > > >> Please consider my previous post in the context of the System that > > uses > > > >> Zookeeper (not ZooKeeper itself). > > > >> > > > >> > > > >> > > > >> -- > > > >> View this message in context: > > > >> > > > > > > http://zookeeper-user.578899.n2.nabble.com/Ensure-there-is-one-master-tp7579367p7579376.html > > > >> Sent from the zookeeper-user mailing list archive at Nabble.com. > > > >> > > > > > >
