Ben, Everybody, What would you think if there were additional events such as "PossibleSessionExpiration", "EstimatedSessionExpiration" and "ProbableSessionExpiration"? This event would be delivered by the client at a time based on the last successful heartbeat, an intermediate point or the connection loss event respectively.
Does this sound interesting? On Fri, Apr 22, 2011 at 3:06 PM, Dave Wright <[email protected]> wrote: > We ran into this exact scenario, and while it would have been nice to > have the timer option implemented internally by ZK, we ended up > implementing it externally ourself. We start a timer on the > disconnected event, and when it gets "close" to the session timeout, > we trigger the session lost behavior on the master. > We may be without a master for a second or two, but that's OK in our > case. As Ted mentioned, without a connection to ZK, there is no way to > time it exactly anyway. > > The one advantage of having the session-lost timer running within > zkclient instead of our app, is that it could track the timer from the > last actual heartbeat, rather than the disconnected event. Depending > on the network conditions that caused the disconnection, it may have > been a while from when we actually lost connectivity to ZK to when the > disconnection event triggers, so our own timer may not be super > accurate. Having zkclient set a timer based on the last heartbeat, and > triggering the session lost event when that timer expires would be > more accurate. > > -Dave > > > On Fri, Apr 22, 2011 at 10:03 AM, Ted Dunning <[email protected]> > wrote: > > Well there are real limits about what knowledge you can have in a split > > brain and how much coordination there can be. > > > > Having exactly one master in such situation is impossible. You get to > pick > > your error scenario, however. One option is to have one master almost > all > > the time with a failure mode of having zero acting masters a bit of the > > time. The other option is to have one master almost all the time with a > > failure mode that has two masters a bit of the time. You get to pick > which > > one. > > > > As Ben stated, the philosophy of ZK is to report facts that can be > > demonstrated. Your application will work pretty well with a timer even > > though that could result in momentary double master situations. Of > course, > > it can also result in periods of zero master as well since a master cut > off > > from ZK may well be cut off from the clients who want to be served. > > > > So the API isn't making a promise it can't keep. It is promising to > report > > to you as soon as it is certain of things. And it does. > > > > On Fri, Apr 22, 2011 at 6:51 AM, Scott Fines <[email protected]> > wrote: > > > >> I guess my objection would be that the API is making a promise that it > can > >> only deliver part of the time. If the client can't reconnect to > ZooKeeper, > >> then the client hasn't expired, which is an unusual state to find > oneself > >> in, and in leader-election systems like mine could result in having two > >> practical leaders, while ZooKeeper is insisting that there is only one. > >> This > >> kind of split-brain scenario seems unavoidable in the absence of > >> probabilistic failure checking (like timeouts). > >> > >> The FAQ, I've noticed, does make mention of this phenomenon. Perhaps > >> something should be indicated there regarding the why and not just the > >> mechanics. Otherwise, developers such as myself might find themselves > >> unduly > >> confused by it :) > >> > >> Thanks for all your help, > >> > > >
