Camille Fournier commented on ZOOKEEPER-922:

I'm interested in hearing the problems that you believe it would lead to in 
more detail. To me, this feels like a reasonable compromise solution to a tough 
problem. If the problem you foresee is a client and server getting disconnected 
from each other but both staying alive, and this causing weirdness leading to a 
session expiration for the client on reconnecting to another server, for my 
particular scenario that is fine. I have a wrapped ZK client that is highly 
tolerant to all sorts of failures and has no problem resetting its state. I 
realize that may not be acceptable for other users, and I would not propose 
this solution without either community agreement that this risk, if 
well-documented, is ok, or a fix for that problem. But I don't know what other 
problems you are seeing and while I might be able to solve them if you help me 
see what they are, I can't do anything on vague suppositions of problematic 
circumstances. Don't get me wrong, I'm not married to this solution, but I am 
interested in some solution if possible. 

It seems to me that not allowing clients to reconnect to other servers causes a 
host of other problems and is a worse solution for people that would not want 
this fast expiration forced on them. In what scenarios can a client not 
reconnect to another server? All? Obviously that won't fly because even I would 
not want to have all of my sessions expire in the case of an ensemble member 
dying and clients failing over. If we only want to do this where my code is 
doing the "touchAndClose" (ie, when the server the client was connected to sees 
a failure-based disconnect), then we see exactly the same potential problem 
outlined above where the client could still be alive but have a switch go down 
and disconnect it from the server. Now it tries to fail over and its session is 
always dead. I'm not convinced off the bat that that is any better than letting 
it try to fail over and risking a potential session timeout race, which I think 
could possibly be fixed by associating the client session with the server 
currently maintaining it (already done but not passed through on ticks). 

What did you mean in the earlier comment about this causing leadership election 
issues? Does this actually interact with that at all? This is the kind of thing 
I could use guidance on. Or we can let this whole idea drop, but it does seem 
that more people than me are interested so might be worth hashing it out.

> enable faster timeout of sessions in case of unexpected socket disconnect
> -------------------------------------------------------------------------
>                 Key: ZOOKEEPER-922
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Camille Fournier
>            Assignee: Camille Fournier
>             Fix For: 3.4.0
>         Attachments: ZOOKEEPER-922.patch
> In the case when a client connection is closed due to socket error instead of 
> the client calling close explicitly, it would be nice to enable the session 
> associated with that client to time out faster than the negotiated session 
> timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
> discovery provider to remove ephemeral nodes for crashed clients quickly, 
> while allowing for a longer heartbeat-based timeout for java clients that 
> need to do long stop-the-world GC. 
> I propose doing this by setting the timeout associated with the crashed 
> session to "minSessionTimeout".

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to