Rather than the wiki would be great to get this into the docs. Would you mind creating a JIRA? https://issues.apache.org/jira/browse/ZOOKEEPER
Thanks, <https://issues.apache.org/jira/browse/ZOOKEEPER>Patrick On Tue, Aug 17, 2010 at 8:29 PM, Qing Yan <qing...@gmail.com> wrote: > Thanks for the explaination! I suggest this goes to the wiki.. > > <quote> > the client only finds out about session expiration events when the client > reconnects to the cluster. if zk tells a client that its session is > expired, > the ephemerals that correspond to that session will already be cleaned up. > > - deletion of an ephemeral file due to loss of client connection will occur > after the client gets a connection loss > > - deletion of an ephemeral file will precede delivery of a session > expiration event to the owner > </quote> > > So session expirations means two things here : server view(ephemeral clean > up) & client view(event delivery) , there are > no guarantee how long it will take in between, correct? > > I guess the confusion rises from the documention which doesn't distinguish > these two concepts, e.g. in the javadoc > http://hadoop.apache.org/zookeeper/docs/r3.3.1/api/index.html > > An ephemeral node will be removed by the ZooKeeper automatically when the > session associated with the creation of the node expires. > > It is actually refering to the server view not the client view. > > > > On Wed, Aug 18, 2010 at 1:12 AM, Ted Dunning <ted.dunn...@gmail.com> > wrote: > > > Uncharacteristically, I think that Ben's comments could use a little bit > of > > amplification. > > > > First, ZK is designed with certain guarantees in mind and almost all > > operational characteristics flow logically from these guarantees. > > > > The guarantee that Ben mentioned here in passing is that if a client gets > > session expiration, it is *guaranteed* that the ephemerals have been > > cleaned > > up. This guarantee is what drives the notification of session expiration > > after reconnection since while the client is disconnected, it cannot know > > if > > the cluster is operating correctly or not and thus cannot know if the > > ephemerals have been cleaned up yet. The only way to have certain > > knowledge > > that the cluster has cleaned up the ephemerals is to get back in touch > with > > an operating cluster. > > > > The client is not completely in the dark. As Ben implied, it can know > that > > the cluster is unavailable (it got a ConnectionLoss event, after all). > > While the cluster is unavailable and before it gets a session expiration > > notification, the client can go into safe mode. > > > > The moral of this story is that to get the most out of ZK, it is best to > > adopt the same guarantee based design process that drove ZK in the first > > place. The first step is that you have to decide what guarantees that > you > > want to provide and then work from ZK's guarantees to get to yours. > > > > In the classic leader-election use of ZK, the key guarantee that we want > > is: > > > > - the number of leaders is less than or equal to 1 > > > > Note that you can't guarantee that the number == 1, because other stuff > > could happen. This has nothing to do with ZK. > > > > The pertinent ZK guarantees are: > > > > - an ephemeral file can only be created by a single session > > > > - deletion of an ephemeral file due to loss of client connection will > occur > > after the client gets a connection loss > > > > - deletion of an ephemeral file will precede delivery of a session > > expiration event to the owner > > > > Phrased in terms of CSP-like constructs, the client has events > > BecomeMaster, > > EnterSafeMode, ExitSafeMode, RelinquishMaster and Crash that must occur > > according to this grammar: > > > > client := ( > > (BecomeMaster; (EnterSafeMode; ExitSafeMode)*; > > EnterSafeMode?; RelinquishMaster) > > | (BecomeMaster; (EnterSafeMode; ExitSafeMode)*; EnterSafeMode?; Crash) > > | Crash > > )* > > > > To get the guarantees that we want, we can require the client to only do > > BecomeMaster after it creates an ephemeral file and require it to either > > Crash, RelinquishMaster or EnterSafeMode before that ephemeral file is > > deleted. The only way that we can do that is to immediately do > > EnterSafeMode on connection loss and then do RelinquishMaster on session > > expiration or ExitSafeMode on connection restored. It is involved, but > you > > can actually do a proof of correctness from this that shows that your > > guarantee will be honored even in the presence of ZK or the client > crashing > > or being partitioned. > > > > > > > > On Tue, Aug 17, 2010 at 9:26 AM, Benjamin Reed <br...@yahoo-inc.com> > > wrote: > > > > > there are two things to keep in mind when thinking about this issue: > > > > > > 1) if a zk client is disconnected from the cluster, the client is > > > essentially in limbo. because the client cannot talk to a server it > > cannot > > > know if its session is still alive. it also cannot close its session. > > > > > > 2) the client only finds out about session expiration events when the > > > client reconnects to the cluster. if zk tells a client that its session > > is > > > expired, the ephemerals that correspond to that session will already be > > > cleaned up. > > > > > > one of the main design points about zk is that zk only gives correct > > > information. if zk cannot give correct information, it basically says > "i > > > don't know". connection loss exceptions and disconnected states are > > > basically "i don't know". > > > > > > generally applications we design go into a "safe" mode, meaning they > may > > > serve reads but reject changes, when disconnected from zk and only kill > > > themselves when they find out their session has expired. > > > > > > ben > > > > > > ps - session information is replicated to all zk servers, so if a > leader > > > dies, all replicas know the sessions that are currently active and > their > > > timeouts. > > > > > > On 08/16/2010 09:03 PM, Ted Dunning wrote: > > > > > >> Ben or somebody else will have to repeat some of the detailed logic > for > > >> this, but it has > > >> to do with the fact that you can't be sure what has happened during > the > > >> network partition. > > >> One possibility is the one you describe, but another is that the > > partition > > >> happened because > > >> a majority of the ZK cluster lost power and you can't see the > remaining > > >> nodes. Those nodes > > >> will continue to serve any files in a read-only fashion. If the > > partition > > >> involves you losing > > >> contact with the entire cluster at the same time a partition of the > > >> cluster > > >> into a quorum and > > >> a minority happens, then your ephemeral files could continue to exist > at > > >> least until the breach > > >> in the cluster itself is healed. > > >> > > >> Suffice it to say that there are only a few strategies that leave you > > with > > >> a > > >> coherent picture > > >> of the universe. Importantly, you shouldn't assume that the > ephemerals > > >> will > > >> disappear at > > >> the same time as the session expiration event is delivered. > > >> > > >> On Mon, Aug 16, 2010 at 8:31 PM, Qing Yan<qing...@gmail.com> wrote: > > >> > > >> > > >> > > >>> Ouch, is this the current ZK behavior? This is unexpected, if the > > >>> client get partitioned from ZK cluster, he should > > >>> get notified and take some action(e.g. commit suicide) otherwise how > > >>> to tell a ephemeral node is really > > >>> up or down? Zombie can create synchronization nightmares.. > > >>> > > >>> > > >>> > > >>> On Mon, Aug 16, 2010 at 7:22 PM, Dave Wright<wrig...@gmail.com> > > wrote: > > >>> > > >>> > > >>>> Another possible cause for this that I ran into recently with the c > > >>>> > > >>>> > > >>> client - > > >>> > > >>> > > >>>> you don't get the session expired notification until you are > > reconnected > > >>>> > > >>>> > > >>> to > > >>> > > >>> > > >>>> the quorum and it informs you the session is lost. If you get > > >>>> > > >>>> > > >>> disconnected > > >>> > > >>> > > >>>> and can't reconnect you won't get the notification. Personally I > > think > > >>>> > > >>>> > > >>> the > > >>> > > >>> > > >>>> client api should track the session expiration time locally and > > >>>> > > >>>> > > >>> information > > >>> > > >>> > > >>>> you once it's expired. > > >>>> > > >>>> On Aug 16, 2010 2:09 AM, "Qing Yan"<qing...@gmail.com> wrote: > > >>>> > > >>>> Hi Ted, > > >>>> > > >>>> Do you mean GC problem can prevent delivery of SESSION EXPIRE > event? > > >>>> Hum...so you have met this problem before? > > >>>> I didn't see any OOM though, will look into it more. > > >>>> > > >>>> > > >>>> On Mon, Aug 16, 2010 at 12:46 PM, Ted Dunning<ted.dunn...@gmail.com > > > > >>>> > > >>>> > > >>> wrote: > > >>> > > >>> > > >>>> I am assuming that y... > > >>>>> > > >>>>> > > >>>> > > >>>> > > >>> > > >>> > > >> > > > > > >