Hi Jordan, > Why would client 1s connection be unstable but client 2s not? In any normal > usage the ZK clients are going to be on the same network. Or, are you > thinking cross-data-center usage? In my opinion, ZooKeeper is not suited to > cross data center usage.
er... the word "unstable" I used is misleading; A full functional(or stable?) tcp connection is supposed to be encountered with some network congestion, and should / can handle this situation well, but might be with some delay of delivering the segments; High volume of traffic in LAN may lead to the above situation, and it is not rare, I think. Even if there was no such congestion, there is always a time lag, between zk sends session-timeout message and client receives the message; Without any assumption, we can not ensure that , the client could be ware of that it no longer has the lock - before other clients got the node_not_exist notification and successful executed getChildren and thought it(one of the others) having the lock. I think in practice, we could (or have to) accept this assumption : "the server’s clock advance no faster than a known constant factor faster than the client’s". But the assumption itself is not enough for the correctness of lock protocol; because the client can only passively waiting for the session_time_out message, so the client may need a timer to explicitly check time elapsed. But the recipe claims clearly that: "at any snapshot in time no two clients think they hold the same lock", and "There is no polling or timeouts." > In any event, as others have pointed out, Zookeeper is _not_ a transactional > system. > It is an eventually consistent system that will give you a reasonable degree > of distributed coordination semantics. I should admit that I do not know whether ZK is eventually consistent , transactional or not. (BTW, there is a recipe for 2pc, and some guys claim that *Zab* is Sequential Consistent); Does these properties of ZK implies there is assumptions of clock drift? >There are edge cases as you describe but they are in the level of noise. You might be right, but for me, edge cases is what I am worrying about (please do not get me wrong, I mean, different applications have different requirements / constraints). > > -Jordan > > On Jan 14, 2013, at 5:52 PM, Hulunbier <[email protected]> wrote: > >> Hi Vitalii, >> >> Thanks a lot, got your idea. >> >> Suppose we are measuring the time of events outsides the system(zk & >> clients) . >> >> And we have no client side time tracking routine. >> >> And t_i < t_k if i < k >> >> t_0 : >> >> client1 has created lock/node1, client2 has created lock/node2; >> client1 thinks itself holding the lock; client2 does not, and watching >> lock/node1. >> >> t_1 : >> >> ZK thinks client1's session is timeout(let's say, client1 is actually >> failed to send heart-beat message on time, due to a long pause of jvm >> gc). >> >> ZK deletes lock/node1, >> sends timeout message to client1, >> sends "node_not_exist" message to client2 (or send this message before >> the deletion, but it does not matter in our case) >> >> but for some reason, link between zk and client1 becomes very unstable, >> high packet loss, large amount of packet retransmission, >> which leads to a significant packet transmission delay(between client1 >> and zk only), but the tcp connection is NOT broken. >> >> t_2: >> >> client2 got the "node_not_exist" event, and issues the getChildren Cmd >> >> t_3: >> >> client2 found the only node lock/node2, and thinks itself holding the >> lock, and begins acting like a lock owner. >> >> (at the same time, client1 is also thinking itself holding the lock) >> >> t_4: >> >> session_timeout message not reach client1 yet, >> >> client1's jvm gc completed, doing something as the lock-owner. >> >> t_5: >> >> network becomes stable, finally, the session_timeout message sent from >> zk reached client1; >> >> client1 thinks itself no longer holding the lock, but it is too late, >> it has done something really bad between t_4 and t_5. >> >> -------------------------- >> >> Sorry for the grammar, I am not a native English speaker. >> >> >> On Mon, Jan 14, 2013 at 11:38 PM, Vitalii Tymchyshyn <[email protected]> >> wrote: >>> There are two events: disconnected and session expired. The ephemeral nodes >>> are removed after the second one. The client receives both. So to >>> implement "at most one lock holder" scheme, client owning lock must think >>> it've lost lock ownership since it've received disconnected event. So, >>> there is period of time between disconnect and session expired when noone >>> should have the lock. It's "safety" time to accomodate for time shifts, >>> network latencies, lock ownership recheck interval (in case when client >>> can't stop using resource immediatelly and simply checks regulary if it >>> still holds the lock). >>> >>> >>> >>> 2013/1/14 Hulunbier <[email protected]> >>> >>>> Hi Vitalii, >>>> >>>>> I don't see why clock must be in sync. >>>> >>>> I don't see any reason to precisely sync the clocks either (but if we >>>> could ... that would be wonderful.). >>>> >>>> By *some constrains of clock drift*, I mean : >>>> >>>> "Every node has a clock, and all clocks increase at the same rate" >>>> or >>>> "the server’s clock advance no faster than a known constant factor >>>> faster than the client’s.". >>>> >>>> >>>>> Also note the difference between disconnected and session >>>>> expired events. This time difference is when client knows "something's >>>>> wrong", but another client did not get a lock yet. >>>> >>>> sorry, but I failed to get your idea well; would you please give me >>>> some further explanation? >>>> >>>> >>>> On Mon, Jan 14, 2013 at 6:37 PM, Vitalii Tymchyshyn <[email protected]> >>>> wrote: >>>>> I don't see why clock must be in sync. They are counting time periods >>>>> (timeouts). Also note the difference between disconnected and session >>>>> expired events. This time difference is when client knows "something's >>>>> wrong", but another client did not get a lock yet. You will have problems >>>>> if client can't react (and release resources) between this two events. >>>>> >>>>> Best regards, Vitalii Tymchyshyn >>>>> >>>>> >>>>> 2013/1/13 Hulunbier <[email protected]> >>>>> >>>>>> Thanks Jordan, >>>>>> >>>>>>> Assuming the clocks are in sync between all participants… >>>>>> >>>>>> imho, perfect clock synchronization in a distributed system is very >>>>>> hard (if it can be). >>>>>> >>>>>>> Someone with better understanding of ZK internals can correct me, but >>>>>> this is my understanding. >>>>>> >>>>>> I think I might have missed some very important and subtile(or >>>>>> obvious?) points of the recipe / ZK protocol. >>>>>> >>>>>> I just can not believe that, there could be such type of a flaw in the >>>>>> lock-recipe, for so long time, without anybody has pointed it out. >>>>>> >>>>>> On Sun, Jan 13, 2013 at 9:31 AM, Jordan Zimmerman >>>>>> <[email protected]> wrote: >>>>>>> On Jan 12, 2013, at 2:30 AM, Hulunbier <[email protected]> wrote: >>>>>>> >>>>>>>> Suppose the network link betweens client1 and server is at very low >>>>>>>> quality (high packet loss rate?) but still fully functional. >>>>>>>> >>>>>>>> Client1 may be happily sending heart-beat-messages to server without >>>>>>>> notice anything; but ZK server could be unable to receive >>>>>>>> heart-beat-messages from client1 for a long period of time , which >>>>>>>> leads ZK server to timeout client1's session, and delete the >>>> ephemeral >>>>>>>> node >>>>>>> >>>>>>> I believe the heartbeats go both ways. Thus, if the client doesn't >>>> hear >>>>>> from the server it will post a Disconnected event. >>>>>>> >>>>>>>> But I still feels that, no matter how well a ZK application behaves, >>>>>>>> if we use ephemeral node in the lock-recipe; we can not guarantee "at >>>>>>>> any snapshot in time no two clients think they hold the same lock", >>>>>>>> which is the fundamental requirement/constraint for a lock. >>>>>>> >>>>>>> Assuming the clocks are in sync between all participants… The server >>>> and >>>>>> the client that holds the lock should determine that there is a >>>>>> disconnection at nearly the same time. I imagine that there is a certain >>>>>> amount of time (a few milliseconds) overlap here. But, the next client >>>>>> wouldn't get the notification immediately anyway. Further, when the next >>>>>> client gets the notification, it still needs to execute a getChildren() >>>>>> command, process the results, etc. before it can determine that it has >>>> the >>>>>> lock. That two clients would think they have the lock at the same time >>>> is a >>>>>> vanishingly small possibility. Even if it did happen it would only be >>>> for a >>>>>> few milliseconds at most. >>>>>>> >>>>>>> Someone with better understanding of ZK internals can correct me, but >>>>>> this is my understanding. >>>>>>> >>>>>>> -Jordan >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Best regards, >>>>> Vitalii Tymchyshyn >>>> >>> >>> >>> >>> -- >>> Best regards, >>> Vitalii Tymchyshyn >
