Hi Jordan,

> Why would client 1s connection be unstable but client 2s not? In any normal 
> usage the ZK clients are going to be on the same network. Or, are you 
> thinking cross-data-center usage? In my opinion, ZooKeeper is not suited to 
> cross data center usage.

er... the word "unstable" I used is misleading; A full functional(or
stable?) tcp connection is supposed to be encountered with some
network congestion, and should / can handle this situation well, but
might be with some delay of delivering the segments; High volume of
traffic in LAN may lead to the above situation, and it is not rare, I
think.

Even if there was no such congestion, there is always a time lag,
between zk sends session-timeout message and client receives the
message;
Without any assumption, we can not ensure that , the client could be
ware of that it no longer has the lock - before other clients got the
node_not_exist notification and successful executed getChildren and
thought it(one of the others) having the lock.

I think in practice, we could (or have to) accept this assumption :
"the server’s clock advance no faster than a known constant factor
faster than the client’s".

But the assumption itself is not enough for the correctness of lock
protocol; because the client can only passively waiting for the
session_time_out message, so the client may need a timer to explicitly
check time elapsed.

But the recipe claims clearly that:  "at any snapshot in time no two
clients think they hold the same lock", and "There is no polling or
timeouts."


> In any event, as others have pointed out, Zookeeper is _not_ a transactional 
> system.

> It is an eventually consistent system that will give you a reasonable degree 
> of distributed coordination semantics.

I should admit that I do not know whether ZK is eventually consistent
, transactional or not. (BTW, there is a recipe for 2pc, and some guys
claim that *Zab* is Sequential Consistent);

Does these properties of ZK implies there is assumptions of clock drift?

>There are edge cases as you describe but they are in the level of noise.

You might be right, but for me, edge cases is what I am worrying about
(please do not get me wrong, I mean, different applications have
different requirements / constraints).

>
> -Jordan
>
> On Jan 14, 2013, at 5:52 PM, Hulunbier <[email protected]> wrote:
>
>> Hi Vitalii,
>>
>> Thanks a lot, got your idea.
>>
>> Suppose we are measuring the time of events outsides the system(zk & 
>> clients) .
>>
>> And we have no client side time tracking routine.
>>
>> And t_i < t_k if  i < k
>>
>> t_0 :
>>
>> client1 has created lock/node1, client2 has created lock/node2;
>> client1 thinks itself holding the lock; client2 does not, and watching
>> lock/node1.
>>
>> t_1 :
>>
>> ZK thinks client1's session is timeout(let's say, client1 is actually
>> failed to send heart-beat message on time, due to a long pause of jvm
>> gc).
>>
>> ZK deletes lock/node1,
>> sends timeout message to client1,
>> sends "node_not_exist" message to client2 (or send this message before
>> the deletion, but it does not matter in our case)
>>
>> but for some reason, link between zk and client1 becomes very unstable,
>> high packet loss, large amount of packet retransmission,
>> which leads to a significant packet transmission delay(between client1
>> and zk only), but the tcp connection is NOT broken.
>>
>> t_2:
>>
>> client2 got the "node_not_exist" event, and issues the getChildren Cmd
>>
>> t_3:
>>
>> client2 found the only node lock/node2, and thinks itself holding the
>> lock, and begins acting like a lock owner.
>>
>> (at the same time, client1 is also thinking itself holding the lock)
>>
>> t_4:
>>
>> session_timeout message not reach client1 yet,
>>
>> client1's jvm gc completed, doing something as the lock-owner.
>>
>> t_5:
>>
>> network becomes stable, finally, the session_timeout message sent from
>> zk reached client1;
>>
>> client1 thinks itself no longer holding the lock, but it is too late,
>> it has done something really bad between t_4 and t_5.
>>
>> --------------------------
>>
>> Sorry for the grammar, I am not a native English speaker.
>>
>>
>> On Mon, Jan 14, 2013 at 11:38 PM, Vitalii Tymchyshyn <[email protected]> 
>> wrote:
>>> There are two events: disconnected and session expired. The ephemeral nodes
>>> are removed after the second one. The client  receives both. So to
>>> implement "at most one lock holder" scheme, client owning lock must think
>>> it've lost lock ownership since it've received disconnected event. So,
>>> there is period of time between disconnect and session expired when noone
>>> should have the lock. It's "safety" time to accomodate for time shifts,
>>> network latencies, lock ownership recheck interval (in case when client
>>> can't stop using resource immediatelly and simply checks regulary if it
>>> still holds the lock).
>>>
>>>
>>>
>>> 2013/1/14 Hulunbier <[email protected]>
>>>
>>>> Hi Vitalii,
>>>>
>>>>> I don't see why clock must be in sync.
>>>>
>>>> I don't see any reason to precisely sync the clocks either (but if we
>>>> could ... that would be wonderful.).
>>>>
>>>> By *some constrains of clock drift*, I mean :
>>>>
>>>> "Every node has a clock, and all clocks increase at the same rate"
>>>> or
>>>> "the server’s clock advance no faster than a known constant factor
>>>> faster than the client’s.".
>>>>
>>>>
>>>>> Also note the difference between disconnected and session
>>>>> expired events. This time difference is when client knows "something's
>>>>> wrong", but another client did not get a lock yet.
>>>>
>>>> sorry, but I failed to get your idea well; would you please give me
>>>> some further explanation?
>>>>
>>>>
>>>> On Mon, Jan 14, 2013 at 6:37 PM, Vitalii Tymchyshyn <[email protected]>
>>>> wrote:
>>>>> I don't see why clock must be in sync. They are counting time periods
>>>>> (timeouts). Also note the difference between disconnected and session
>>>>> expired events. This time difference is when client knows "something's
>>>>> wrong", but another client did not get a lock yet. You will have problems
>>>>> if client can't react (and release resources) between this two events.
>>>>>
>>>>> Best regards, Vitalii Tymchyshyn
>>>>>
>>>>>
>>>>> 2013/1/13 Hulunbier <[email protected]>
>>>>>
>>>>>> Thanks Jordan,
>>>>>>
>>>>>>> Assuming the clocks are in sync between all participants…
>>>>>>
>>>>>> imho, perfect clock synchronization in a distributed system is very
>>>>>> hard (if it can be).
>>>>>>
>>>>>>> Someone with better understanding of ZK internals can correct me, but
>>>>>> this is my understanding.
>>>>>>
>>>>>> I think I might have missed some very important and subtile(or
>>>>>> obvious?) points of the recipe / ZK protocol.
>>>>>>
>>>>>> I just can not believe that, there could be such type of a flaw in the
>>>>>> lock-recipe,  for so long time,  without anybody has pointed it out.
>>>>>>
>>>>>> On Sun, Jan 13, 2013 at 9:31 AM, Jordan Zimmerman
>>>>>> <[email protected]> wrote:
>>>>>>> On Jan 12, 2013, at 2:30 AM, Hulunbier <[email protected]> wrote:
>>>>>>>
>>>>>>>> Suppose the network link betweens client1 and server is at very low
>>>>>>>> quality (high packet loss rate?) but still fully functional.
>>>>>>>>
>>>>>>>> Client1 may be happily sending heart-beat-messages to server without
>>>>>>>> notice anything; but ZK server could be unable to receive
>>>>>>>> heart-beat-messages from client1 for a long period of time , which
>>>>>>>> leads ZK server to timeout client1's session, and delete the
>>>> ephemeral
>>>>>>>> node
>>>>>>>
>>>>>>> I believe the heartbeats go both ways. Thus, if the client doesn't
>>>> hear
>>>>>> from the server it will post a Disconnected event.
>>>>>>>
>>>>>>>> But I still feels that, no matter how well a ZK application behaves,
>>>>>>>> if we use ephemeral node in the lock-recipe; we can not guarantee "at
>>>>>>>> any snapshot in time no two clients think they hold the same lock",
>>>>>>>> which is the fundamental requirement/constraint for a lock.
>>>>>>>
>>>>>>> Assuming the clocks are in sync between all participants… The server
>>>> and
>>>>>> the client that holds the lock should determine that there is a
>>>>>> disconnection at nearly the same time. I imagine that there is a certain
>>>>>> amount of time (a few milliseconds) overlap here. But, the next client
>>>>>> wouldn't get the notification immediately anyway. Further, when the next
>>>>>> client gets the notification, it still needs to execute a getChildren()
>>>>>> command, process the results, etc. before it can determine that it has
>>>> the
>>>>>> lock. That two clients would think they have the lock at the same time
>>>> is a
>>>>>> vanishingly small possibility. Even if it did happen it would only be
>>>> for a
>>>>>> few milliseconds at most.
>>>>>>>
>>>>>>> Someone with better understanding of ZK internals can correct me, but
>>>>>> this is my understanding.
>>>>>>>
>>>>>>> -Jordan
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Vitalii Tymchyshyn
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Vitalii Tymchyshyn
>

Reply via email to