Ben, could you explain a bit more why you think this won't work? I'm trying to 
decide if I should put in the work to take the POC I wrote and complete it, but 
I don't really want to waste my time if there's a fundamental reason it's a bad 
idea.

Thanks,
Camille

-----Original Message-----
From: Benjamin Reed [mailto:br...@yahoo-inc.com] 
Sent: Wednesday, September 08, 2010 4:03 PM
To: zookeeper-user@hadoop.apache.org
Subject: Re: closing session on socket close vs waiting for timeout

unfortunately, that only works on the standalone server.

ben

On 09/08/2010 12:52 PM, Fournier, Camille F. [Tech] wrote:
> This would be the ideal solution to this problem I think.
> Poking around the (3.3) code to figure out how hard it would be to implement, 
> I figure one way to do it would be to modify the session timeout to the min 
> session timeout and touch the connection before calling close when you get 
> certain exceptions in NIOServerCnxn.doIO. I did this (removing the code in 
> touch session that returns if the tickTime is greater than the expire time) 
> and it worked (in the standalone server anyway). Interesting solution, or 
> total hack that will not work beyond most basic test case?
>
> C
>
> (forgive lack of actual code in this email)
>
> -----Original Message-----
> From: Ted Dunning [mailto:ted.dunn...@gmail.com]
> Sent: Tuesday, September 07, 2010 1:11 PM
> To: zookeeper-user@hadoop.apache.org
> Cc: Benjamin Reed
> Subject: Re: closing session on socket close vs waiting for timeout
>
> This really is, just as Ben says a problem of false positives and false
> negatives in detecting session
> expiration.
>
> On the other hand, the current algorithm isn't really using all the
> information available.  The current algorithm is
> using time since last client initiated heartbeat.  The new proposal is
> somewhat worse in that it proposes to use
> just the boolean "has-TCP-disconnect-happened".
>
> Perhaps it would be better to use multiple features in order to decrease
> both false positives and false negatives.
>
> For instance, I could imagine that we use the following features:
>
> - time since last client hearbeat or disconnect or reconnect
>
> - what was the last event? (a heartbeat or a disconnect or a reconnect)
>
> Then the expiration algorithm could use a relatively long time since last
> heartbeat and a relatively short time since last disconnect to mark a
> session as disconnected.
>
> Wouldn't this avoid expiration during GC and cluster partition and cause
> expiration quickly after a client disconnect?
>
>
> On Mon, Sep 6, 2010 at 11:26 PM, Patrick Hunt<ph...@apache.org>  wrote:
>
>    
>> That's a good point, however with suitable documentation, warnings and such
>> it seems like a reasonable feature to provide for those users who require
>> it. Used in moderation it seems fine to me. Perhaps we also make it
>> configurable at the server level for those administrators/ops who don't
>> want
>> to deal with it (disable the feature entirely, or only enable on particular
>> servers, etc...).
>>
>> Patrick
>>
>> On Mon, Sep 6, 2010 at 2:10 PM, Benjamin Reed<br...@yahoo-inc.com>  wrote:
>>
>>      
>>> if this mechanism were used very often, we would get a huge number of
>>> session expirations when a server fails. you are trading fast error
>>> detection for the ability to tolerate temporary network and server
>>>        
>> outages.
>>      
>>> to be honest this seems like something that in theory sounds like it will
>>> work in practice, but once deployed we start getting session expirations
>>>        
>> for
>>      
>>> cases that we really do not want or expect.
>>>
>>> ben
>>>
>>>
>>> On 09/01/2010 12:47 PM, Patrick Hunt wrote:
>>>
>>>        
>>>> Ben, in this case the session would be tied directly to the connection,
>>>> we'd explicitly deny session re-establishment for this session type (so
>>>> 4 would fail). Would that address your concern, others?
>>>>
>>>> Patrick
>>>>
>>>> On 09/01/2010 10:03 AM, Benjamin Reed wrote:
>>>>
>>>>
>>>>          
>>>>> i'm a bit skeptical that this is going to work out properly. a server
>>>>> may receive a socket reset even though the client is still alive:
>>>>>
>>>>> 1) client sends a request to a server
>>>>> 2) client is partitioned from the server
>>>>> 3) server starts trying to send response
>>>>> 4) client reconnects to a different server
>>>>> 5) partition heals
>>>>> 6) server gets a reset from client
>>>>>
>>>>> at step 6 i don't think you want to delete the ephemeral nodes.
>>>>>
>>>>> ben
>>>>>
>>>>> On 08/31/2010 01:41 PM, Fournier, Camille F. [Tech] wrote:
>>>>>
>>>>>
>>>>>            
>>>>>> Yes that's right. Which network issues can cause the socket to close
>>>>>> without the initiating process closing the socket? In my limited
>>>>>> experience in this area network issues were more prone to leave dead
>>>>>> sockets open rather than vice versa so I don't know what to look out
>>>>>> for.
>>>>>>
>>>>>> Thanks,
>>>>>> Camille
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Dave Wright [mailto:wrig...@gmail.com]
>>>>>> Sent: Tuesday, August 31, 2010 1:14 PM
>>>>>> To: zookeeper-user@hadoop.apache.org
>>>>>> Subject: Re: closing session on socket close vs waiting for timeout
>>>>>>
>>>>>> I think he's saying that if the socket closes because of a crash (i.e.
>>>>>> not a
>>>>>> normal zookeeper close request) then the session stays alive until the
>>>>>> session timeout, which is of course true since ZK allows reconnection
>>>>>> and
>>>>>> resumption of the session in case of disconnect due to network issues.
>>>>>>
>>>>>> -Dave Wright
>>>>>>
>>>>>> On Tue, Aug 31, 2010 at 1:03 PM, Ted Dunning<ted.dunn...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> That doesn't sound right to me.
>>>>>>>
>>>>>>> Is there a Zookeeper expert in the house?
>>>>>>>
>>>>>>> On Tue, Aug 31, 2010 at 8:58 AM, Fournier, Camille F. [Tech]<
>>>>>>> camille.fourn...@gs.com>   wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>>>> I foolishly did not investigate the ZK code closely enough and it
>>>>>>>> seems
>>>>>>>> that closing the socket still waits for the session timeout to
>>>>>>>> remove the
>>>>>>>> session.
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>>                
>>>>>            
>>>>          
>>>        
>>      

Reply via email to