Re: locking/leader election and dealing with session loss

Jordan Zimmerman Thu, 16 Jul 2015 06:02:59 -0700

Of course there are a myriad theoretical possibilities. But I don’t believe any 
of what you’ve mentioned will happen in production. For any reasonable case, 
you can be guaranteed that no two processes will consider themselves lock 
holders at the same instant in time.


-Jordan


On July 16, 2015 at 7:58:06 AM, Ivan Kelly (iv...@apache.org) wrote:

On Thu, Jul 16, 2015 at 1:38 PM Jordan Zimmerman <jor...@jordanzimmerman.com>  
wrote:  

> Are you really seeing 30s gc pauses in production? If so, then of course  
> this could happen. However, if your application can tolerate a 30s pause  
> (which is hard to believe) then your session timeout is too low. The point  
> of the session timeout is to have enough coverage. So, if your app has 30  
> seconds allowable pauses your session timeout would have to be much longer.  
>  
GC is just an example. There's other ways the same scenario could happen.  
The machine could swap out the process due to load. Someone could do  
something stupid in the zookeeper event thread and the session expired  
event is delayed. The state update could have hit the ip stack during  
network partition, and the process then got wedged. The state update packet  
could have hit the network and been routed via the moon. The clock could  
break.  

If you are relying on a timer on the zk client to maintain a guarantee,  
then you really aren't giving any guarantee because the zk client doesn't  
have control over all the things that could go wrong.  

-Ivan

Re: locking/leader election and dealing with session loss

Reply via email to