[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-18 Thread Camille Fournier (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933560#action_12933560
 ] 

Camille Fournier commented on ZOOKEEPER-922:


My kingdom for a virtual whiteboard! 

I will take some time and write this up.

> enable faster timeout of sessions in case of unexpected socket disconnect
> -
>
> Key: ZOOKEEPER-922
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Camille Fournier
>Assignee: Camille Fournier
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-922.patch
>
>
> In the case when a client connection is closed due to socket error instead of 
> the client calling close explicitly, it would be nice to enable the session 
> associated with that client to time out faster than the negotiated session 
> timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
> discovery provider to remove ephemeral nodes for crashed clients quickly, 
> while allowing for a longer heartbeat-based timeout for java clients that 
> need to do long stop-the-world GC. 
> I propose doing this by setting the timeout associated with the crashed 
> session to "minSessionTimeout".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-18 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933557#action_12933557
 ] 

Benjamin Reed commented on ZOOKEEPER-922:
-

camille, i also think disabling moving sessions is not a good idea or very 
useful, but it seems to be the only way to have sensible semantics. 

may i suggest that we take this discussion a bit higher? i think there are 
fundamental assumptions that you are making that i'm questioning. can you write 
up a high-level design and state your assumptions? i can't quite see how the 
math works out between the client-server timeouts, connect timeouts, and lower 
session timeout. i'm also not clear on how much you are relying on a connection 
reset for the failure detection.

> enable faster timeout of sessions in case of unexpected socket disconnect
> -
>
> Key: ZOOKEEPER-922
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Camille Fournier
>Assignee: Camille Fournier
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-922.patch
>
>
> In the case when a client connection is closed due to socket error instead of 
> the client calling close explicitly, it would be nice to enable the session 
> associated with that client to time out faster than the negotiated session 
> timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
> discovery provider to remove ephemeral nodes for crashed clients quickly, 
> while allowing for a longer heartbeat-based timeout for java clients that 
> need to do long stop-the-world GC. 
> I propose doing this by setting the timeout associated with the crashed 
> session to "minSessionTimeout".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-17 Thread Camille Fournier (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932973#action_12932973
 ] 

Camille Fournier commented on ZOOKEEPER-922:


>From my point of view, a solution that enables faster expiration but disables 
>clients moving sessions to other servers is not a solution I would use. I am 
>not willing to take the massive hit of restarting possibly huge numbers of 
>sessions in the case of a single node failure. I expect the case where a 
>disconnect happens and the client is actually still alive to be vanishingly 
>rare. My clients will die all the time, my ensemble members might die 
>occasionally, if a switch dies, there are much bigger problems than some 
>overaggressive session expiration.

So, Server A has a connection with the client. The switch between client and A 
dies and both see an error disconnect. Possible operations (in some order) 
after this point:
A sends a ping on that session with a lower session timeout
Client connects to B, which will touch the session table with the negotiated 
session timeout
Client starts heartbeating

Scenarios:
1) If A sends the ping with the lower session timeout, and the client cannot 
connect to B before the session expires, the session is expired and no harm no 
foul in my opinion. Sessions expiring due to lag on failover are a possibility 
that anyone using ZK should be defensively programming against.

2) Due to a lag on A's part, it did not send the timeout-lowering ping until 
after the client had connected to B. Client's session timeout is set lower 
until it heartbeats to B and B pings the leader. 
Or, the client might not respond to the heartbeat in this sensitive interval, 
causing it to have its session disconnected. This could quite possibly be 
solved by actually checking that a ping is coming from the current owner of a 
session if it is trying to set the timeout lower than the current timeout. The 
session tracker has the current owner stored. I wouldn't want to have to check 
this on every ping, but it's quite easy to add the logic back that checks if 
the new timeout is lower than the existing timeout, and then check to see if 
the pinger is the current owner. That might require code changes we don't want 
to do, but it seems possible. Alternatively, the session just unexpectedly 
times out. I'm writing defensive code against all possible failures of the ZK, 
so a session timeout is not a huge deal to me.

3) A pings the leader during the client connection negotiation with B. I 
suspect there are several possible interleavings here. I would also expect that 
again the worst case should be that the client sees a session expired error. 
This is the area to dig into more carefully. If there is an interleaving that 
could leave the session open forever, or cause ensemble instability, that would 
be a probably deal-breaker.



> enable faster timeout of sessions in case of unexpected socket disconnect
> -
>
> Key: ZOOKEEPER-922
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Camille Fournier
>Assignee: Camille Fournier
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-922.patch
>
>
> In the case when a client connection is closed due to socket error instead of 
> the client calling close explicitly, it would be nice to enable the session 
> associated with that client to time out faster than the negotiated session 
> timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
> discovery provider to remove ephemeral nodes for crashed clients quickly, 
> while allowing for a longer heartbeat-based timeout for java clients that 
> need to do long stop-the-world GC. 
> I propose doing this by setting the timeout associated with the crashed 
> session to "minSessionTimeout".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-17 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932909#action_12932909
 ] 

Flavio Junqueira commented on ZOOKEEPER-922:


HI Camille, Say a client disconnects from server A and reconnects to server B, 
same session. Server A believes the session should be expired because it 
received an exception. Server B believes the session should stay alive, since 
the client just reconnected. What should we do in this case? Kill the session 
or not?

Our suggestion is to have an option that enables fast expiration and disables 
clients moving sessions to other servers. We are certainly not proposing to 
remove the second functionality from ZooKeeper altogether.

> enable faster timeout of sessions in case of unexpected socket disconnect
> -
>
> Key: ZOOKEEPER-922
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Camille Fournier
>Assignee: Camille Fournier
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-922.patch
>
>
> In the case when a client connection is closed due to socket error instead of 
> the client calling close explicitly, it would be nice to enable the session 
> associated with that client to time out faster than the negotiated session 
> timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
> discovery provider to remove ephemeral nodes for crashed clients quickly, 
> while allowing for a longer heartbeat-based timeout for java clients that 
> need to do long stop-the-world GC. 
> I propose doing this by setting the timeout associated with the crashed 
> session to "minSessionTimeout".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-16 Thread Camille Fournier (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932649#action_12932649
 ] 

Camille Fournier commented on ZOOKEEPER-922:


I'm interested in hearing the problems that you believe it would lead to in 
more detail. To me, this feels like a reasonable compromise solution to a tough 
problem. If the problem you foresee is a client and server getting disconnected 
from each other but both staying alive, and this causing weirdness leading to a 
session expiration for the client on reconnecting to another server, for my 
particular scenario that is fine. I have a wrapped ZK client that is highly 
tolerant to all sorts of failures and has no problem resetting its state. I 
realize that may not be acceptable for other users, and I would not propose 
this solution without either community agreement that this risk, if 
well-documented, is ok, or a fix for that problem. But I don't know what other 
problems you are seeing and while I might be able to solve them if you help me 
see what they are, I can't do anything on vague suppositions of problematic 
circumstances. Don't get me wrong, I'm not married to this solution, but I am 
interested in some solution if possible. 

It seems to me that not allowing clients to reconnect to other servers causes a 
host of other problems and is a worse solution for people that would not want 
this fast expiration forced on them. In what scenarios can a client not 
reconnect to another server? All? Obviously that won't fly because even I would 
not want to have all of my sessions expire in the case of an ensemble member 
dying and clients failing over. If we only want to do this where my code is 
doing the "touchAndClose" (ie, when the server the client was connected to sees 
a failure-based disconnect), then we see exactly the same potential problem 
outlined above where the client could still be alive but have a switch go down 
and disconnect it from the server. Now it tries to fail over and its session is 
always dead. I'm not convinced off the bat that that is any better than letting 
it try to fail over and risking a potential session timeout race, which I think 
could possibly be fixed by associating the client session with the server 
currently maintaining it (already done but not passed through on ticks). 

What did you mean in the earlier comment about this causing leadership election 
issues? Does this actually interact with that at all? This is the kind of thing 
I could use guidance on. Or we can let this whole idea drop, but it does seem 
that more people than me are interested so might be worth hashing it out.

> enable faster timeout of sessions in case of unexpected socket disconnect
> -
>
> Key: ZOOKEEPER-922
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Camille Fournier
>Assignee: Camille Fournier
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-922.patch
>
>
> In the case when a client connection is closed due to socket error instead of 
> the client calling close explicitly, it would be nice to enable the session 
> associated with that client to time out faster than the negotiated session 
> timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
> discovery provider to remove ephemeral nodes for crashed clients quickly, 
> while allowing for a longer heartbeat-based timeout for java clients that 
> need to do long stop-the-world GC. 
> I propose doing this by setting the timeout associated with the crashed 
> session to "minSessionTimeout".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-16 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932639#action_12932639
 ] 

Benjamin Reed commented on ZOOKEEPER-922:
-

if we had a foolproof way to tell that a client is down, we could do this fast 
expire. the methods you are proposing are not foolproof and will lead to 
problems exactly when you most want them not to.

the timeout interactions you are talking about are problematic. it's really 
hard to get them right.

one way that i can see this working is to not allow clients to reconnect to 
other servers. in that can a socket reset would indicate an expired session. is 
this acceptable to you?

> enable faster timeout of sessions in case of unexpected socket disconnect
> -
>
> Key: ZOOKEEPER-922
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Camille Fournier
>Assignee: Camille Fournier
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-922.patch
>
>
> In the case when a client connection is closed due to socket error instead of 
> the client calling close explicitly, it would be nice to enable the session 
> associated with that client to time out faster than the negotiated session 
> timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
> discovery provider to remove ephemeral nodes for crashed clients quickly, 
> while allowing for a longer heartbeat-based timeout for java clients that 
> need to do long stop-the-world GC. 
> I propose doing this by setting the timeout associated with the crashed 
> session to "minSessionTimeout".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-16 Thread Camille Fournier (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932530#action_12932530
 ] 

Camille Fournier commented on ZOOKEEPER-922:


This only changes the timeout on case of a detected exception that closes the 
socket (see the patch). The purpose in fact is to enable environments with 
machines that may have long GC pause times to have long max session timeouts, 
while still clearing the ephemeral nodes of crashed clients more quickly.
The only crash I am intending to deal with here is the crash that causes an 
exception closing the socket on the server side. This can also be caused by a 
switch failure, but in my environment it is much much much much more likely to 
be caused by the client process crashing. I don't expect to be able to 
perfectly deal with all cases of client crash, because there are some that 
don't cause the socket to close and that can't be differentiated from a client 
doing a long full GC. 

> enable faster timeout of sessions in case of unexpected socket disconnect
> -
>
> Key: ZOOKEEPER-922
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Camille Fournier
>Assignee: Camille Fournier
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-922.patch
>
>
> In the case when a client connection is closed due to socket error instead of 
> the client calling close explicitly, it would be nice to enable the session 
> associated with that client to time out faster than the negotiated session 
> timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
> discovery provider to remove ephemeral nodes for crashed clients quickly, 
> while allowing for a longer heartbeat-based timeout for java clients that 
> need to do long stop-the-world GC. 
> I propose doing this by setting the timeout associated with the crashed 
> session to "minSessionTimeout".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-16 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932512#action_12932512
 ] 

Flavio Junqueira commented on ZOOKEEPER-922:


I think I understand your motivation, but I'm not sure it will work the way you 
expect it to work. I'm afraid that you might end end up getting lots of false 
positives due to delays introduced by the environment (e.g., jvm gc). Let me 
clarify one thing first: when you refer to clients crashing, are you thinking 
about the jvm crashing or the whole machine becoming unavailable? Basically my 
question is if you really expect connections to be cleanly closed or not.


> enable faster timeout of sessions in case of unexpected socket disconnect
> -
>
> Key: ZOOKEEPER-922
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Camille Fournier
>Assignee: Camille Fournier
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-922.patch
>
>
> In the case when a client connection is closed due to socket error instead of 
> the client calling close explicitly, it would be nice to enable the session 
> associated with that client to time out faster than the negotiated session 
> timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
> discovery provider to remove ephemeral nodes for crashed clients quickly, 
> while allowing for a longer heartbeat-based timeout for java clients that 
> need to do long stop-the-world GC. 
> I propose doing this by setting the timeout associated with the crashed 
> session to "minSessionTimeout".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-16 Thread Camille Fournier (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932502#action_12932502
 ] 

Camille Fournier commented on ZOOKEEPER-922:


If the client connects to another server within whatever the time they set the 
minSessionTimeout to, the client should heartbeat the session and it shouldn't 
get timed out. Otherwise, their session will be expired and they'll get the 
standard expired session scenario. 
If you are working in a setup where you think that unexpected disconnects will 
largely be due to clients crashing and you want ephemeral data aggressively 
removed in that scenario, with this design you set the minSessionTimeout to a 
low value and allow the ZK to quickly timeout those sessions. If you are 
working in a setup where unexpected disconnects are more likely to be due to 
network problems, or you want to give data a longer time to survive, you have 
the option of setting the timeout to a higher value (ideally the same as the 
negotiated session timeout, but that might require a code change to match 
negotiation), which should give the same behavior as now.

> enable faster timeout of sessions in case of unexpected socket disconnect
> -
>
> Key: ZOOKEEPER-922
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Camille Fournier
>Assignee: Camille Fournier
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-922.patch
>
>
> In the case when a client connection is closed due to socket error instead of 
> the client calling close explicitly, it would be nice to enable the session 
> associated with that client to time out faster than the negotiated session 
> timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
> discovery provider to remove ephemeral nodes for crashed clients quickly, 
> while allowing for a longer heartbeat-based timeout for java clients that 
> need to do long stop-the-world GC. 
> I propose doing this by setting the timeout associated with the crashed 
> session to "minSessionTimeout".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-16 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932500#action_12932500
 ] 

Flavio Junqueira commented on ZOOKEEPER-922:


Hi! I'm confused by this proposal. What happens if the client disconnects form 
one server and moves to another? Or you want to be able to disable that feature 
as well?

> enable faster timeout of sessions in case of unexpected socket disconnect
> -
>
> Key: ZOOKEEPER-922
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Camille Fournier
>Assignee: Camille Fournier
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-922.patch
>
>
> In the case when a client connection is closed due to socket error instead of 
> the client calling close explicitly, it would be nice to enable the session 
> associated with that client to time out faster than the negotiated session 
> timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
> discovery provider to remove ephemeral nodes for crashed clients quickly, 
> while allowing for a longer heartbeat-based timeout for java clients that 
> need to do long stop-the-world GC. 
> I propose doing this by setting the timeout associated with the crashed 
> session to "minSessionTimeout".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-09 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930407#action_12930407
 ] 

Patrick Hunt commented on ZOOKEEPER-922:


@camille NP, although it makes it easier for us (reviewers) if all the patches 
are consistent. For future reference then. Thanks.

ps. you might get more insightful review if you post to apache's new 
reviewboard server: https://reviews.apache.org

Regards.

> enable faster timeout of sessions in case of unexpected socket disconnect
> -
>
> Key: ZOOKEEPER-922
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Camille Fournier
>Assignee: Camille Fournier
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-922.patch
>
>
> In the case when a client connection is closed due to socket error instead of 
> the client calling close explicitly, it would be nice to enable the session 
> associated with that client to time out faster than the negotiated session 
> timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
> discovery provider to remove ephemeral nodes for crashed clients quickly, 
> while allowing for a longer heartbeat-based timeout for java clients that 
> need to do long stop-the-world GC. 
> I propose doing this by setting the timeout associated with the crashed 
> session to "minSessionTimeout".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930249#action_12930249
 ] 

Hadoop QA commented on ZOOKEEPER-922:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12459059/ZOOKEEPER-922.patch
  against trunk revision 1033155.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/7//console

This message is automatically generated.

> enable faster timeout of sessions in case of unexpected socket disconnect
> -
>
> Key: ZOOKEEPER-922
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Camille Fournier
>Assignee: Camille Fournier
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-922.patch
>
>
> In the case when a client connection is closed due to socket error instead of 
> the client calling close explicitly, it would be nice to enable the session 
> associated with that client to time out faster than the negotiated session 
> timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
> discovery provider to remove ephemeral nodes for crashed clients quickly, 
> while allowing for a longer heartbeat-based timeout for java clients that 
> need to do long stop-the-world GC. 
> I propose doing this by setting the timeout associated with the crashed 
> session to "minSessionTimeout".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-09 Thread Camille Fournier (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930248#action_12930248
 ] 

Camille Fournier commented on ZOOKEEPER-922:


I wasn't really expecting this patch to be applied by the build, since it is 
just an illustration of a possible solution for the problem (and has no unit 
tests or anything). Do you still want to run it through the build given that?

> enable faster timeout of sessions in case of unexpected socket disconnect
> -
>
> Key: ZOOKEEPER-922
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Camille Fournier
>Assignee: Camille Fournier
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-922.patch
>
>
> In the case when a client connection is closed due to socket error instead of 
> the client calling close explicitly, it would be nice to enable the session 
> associated with that client to time out faster than the negotiated session 
> timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
> discovery provider to remove ephemeral nodes for crashed clients quickly, 
> while allowing for a longer heartbeat-based timeout for java clients that 
> need to do long stop-the-world GC. 
> I propose doing this by setting the timeout associated with the crashed 
> session to "minSessionTimeout".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-09 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930193#action_12930193
 ] 

Patrick Hunt commented on ZOOKEEPER-922:


Hi Camille, the patch has to be created from the top most directory ("trunk") 
for hudson to apply the patch correctly, please see:
http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute

(basically checkout trunk, make changes, do "svn diff" at the toplevel)

Thanks!

> enable faster timeout of sessions in case of unexpected socket disconnect
> -
>
> Key: ZOOKEEPER-922
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Camille Fournier
>Assignee: Camille Fournier
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-922.patch
>
>
> In the case when a client connection is closed due to socket error instead of 
> the client calling close explicitly, it would be nice to enable the session 
> associated with that client to time out faster than the negotiated session 
> timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
> discovery provider to remove ephemeral nodes for crashed clients quickly, 
> while allowing for a longer heartbeat-based timeout for java clients that 
> need to do long stop-the-world GC. 
> I propose doing this by setting the timeout associated with the crashed 
> session to "minSessionTimeout".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930011#action_12930011
 ] 

Hadoop QA commented on ZOOKEEPER-922:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12459059/ZOOKEEPER-922.patch
  against trunk revision 1032882.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/2//console

This message is automatically generated.

> enable faster timeout of sessions in case of unexpected socket disconnect
> -
>
> Key: ZOOKEEPER-922
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Camille Fournier
>Assignee: Camille Fournier
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-922.patch
>
>
> In the case when a client connection is closed due to socket error instead of 
> the client calling close explicitly, it would be nice to enable the session 
> associated with that client to time out faster than the negotiated session 
> timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
> discovery provider to remove ephemeral nodes for crashed clients quickly, 
> while allowing for a longer heartbeat-based timeout for java clients that 
> need to do long stop-the-world GC. 
> I propose doing this by setting the timeout associated with the crashed 
> session to "minSessionTimeout".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-08 Thread Camille Fournier (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929789#action_12929789
 ] 

Camille Fournier commented on ZOOKEEPER-922:


After some thought one approach to fix this might be to have the leader send 
the cnxn info through the session touch call in the case of PING, and only 
allow the timeout for a session to be lowered if the requester is the current 
owner of that session. It feels like a hack (you probably wouldn't want to 
force a valid "owner" to be checked for each touch) but I think it would solve 
that particular race condition. 

> enable faster timeout of sessions in case of unexpected socket disconnect
> -
>
> Key: ZOOKEEPER-922
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Camille Fournier
>Assignee: Camille Fournier
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-922.patch
>
>
> In the case when a client connection is closed due to socket error instead of 
> the client calling close explicitly, it would be nice to enable the session 
> associated with that client to time out faster than the negotiated session 
> timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
> discovery provider to remove ephemeral nodes for crashed clients quickly, 
> while allowing for a longer heartbeat-based timeout for java clients that 
> need to do long stop-the-world GC. 
> I propose doing this by setting the timeout associated with the crashed 
> session to "minSessionTimeout".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-08 Thread Camille Fournier (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929687#action_12929687
 ] 

Camille Fournier commented on ZOOKEEPER-922:


That is a valid concern. For the system I am implementing, I would rather 
aggressively time out connections I believe to be closed with the risk of 
occasionally hitting this particular edge case (my client can automatically 
re-connect and re-establish its ephemeral data if necessary), but it's worth 
thinking about whether it is possible to avoid.
I realize the extreme end of this argument is just to set the session timeout 
lower and let the gc-ing clients re-establish their state but I want the best 
of both worlds.

> enable faster timeout of sessions in case of unexpected socket disconnect
> -
>
> Key: ZOOKEEPER-922
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Camille Fournier
>Assignee: Camille Fournier
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-922.patch
>
>
> In the case when a client connection is closed due to socket error instead of 
> the client calling close explicitly, it would be nice to enable the session 
> associated with that client to time out faster than the negotiated session 
> timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
> discovery provider to remove ephemeral nodes for crashed clients quickly, 
> while allowing for a longer heartbeat-based timeout for java clients that 
> need to do long stop-the-world GC. 
> I propose doing this by setting the timeout associated with the crashed 
> session to "minSessionTimeout".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-08 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929683#action_12929683
 ] 

Benjamin Reed commented on ZOOKEEPER-922:
-

how do you deal with the following race condition:

1) the client is connected to follower1
2) the client has problems talking to follower1, so it closes the connection
3) the client connects to follower2
4) follower1 detects the closed connection and sets the connection timeout to 
min
5) the client is idle for min timeout and the leader expires the connection

the race condition is steps 3) and 4). if follower1 doesn't detect the dead 
connection fast enough, it can improperly set the timeout.

> enable faster timeout of sessions in case of unexpected socket disconnect
> -
>
> Key: ZOOKEEPER-922
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
> Project: Zookeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Camille Fournier
>Assignee: Camille Fournier
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-922.patch
>
>
> In the case when a client connection is closed due to socket error instead of 
> the client calling close explicitly, it would be nice to enable the session 
> associated with that client to time out faster than the negotiated session 
> timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
> discovery provider to remove ephemeral nodes for crashed clients quickly, 
> while allowing for a longer heartbeat-based timeout for java clients that 
> need to do long stop-the-world GC. 
> I propose doing this by setting the timeout associated with the crashed 
> session to "minSessionTimeout".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.