I'll change the output level and take a look @ the server logs next
time I start seeing the error.  Up to that point, I don't recall
seeing any timeouts on the client side before the session expiration
errors.

Either way I've modified my code to create a new client instance if
there is a fatal exception during leader election, which should help
recovery from the session timeout.

Env is Ubuntu 8.10, JRE 1.6.0_11 x64.  Three local QuorumPeerMain
instances.  I'll reply again when I have more info.

Thanks again guys for your help.
-Tom


2009/2/12 Patrick Hunt <ph...@apache.org>:
> Tom, you might try changing the log4j default log level to DEBUG for the 
> rootlogger and appender if you have not already done so (servers and clients 
> both). You'll get more information to aid debugging if it does occur again.
> http://hadoop.apache.org/zookeeper/docs/r3.0.1/zookeeperAdmin.html#sc_logging
>
> Also, are you seeing timeouts on the client, or just session expiration on 
> the server?
>
> The stat command, detailed here, may also be of use to you:
> http://hadoop.apache.org/zookeeper/docs/r3.0.1/zookeeperAdmin.html#sc_zkCommands
>
> Knowing more about your env, OS & java version in particular, would also help 
> us help you narrow things down. :-)
>
> Patrick
>
> Tom Nichols wrote:
>>
>> On Thu, Feb 12, 2009 at 4:11 PM, Benjamin Reed <br...@yahoo-inc.com> wrote:
>>>
>>> idleness is not a problem. the client library sends heartbeats to keep the 
>>> session alive. the client library will also handle reconnects automatically 
>>> if a server dies.
>>
>> That's odd then that I'm seeing this problem.  I have a local, 3-node
>> zookeeper quorum, and I have 3 instances of the client also running on
>> the same box.  The session expiry doesn't seem to be in response to
>> any severe load on the machine or anything like that.  I'll keep an
>> eye on it and see if I can't reproduce the behavior in a distributed
>> environment.
>>
>> I've realized a relatively easy way to deal with this problem -- I can
>> let my thread throw a fatal unchecked exception and then use a
>> ThreadGroup implementation that catches the exception.  This in turn
>> spawns a new client thread and adds it back to the same threadGroup.
>>
>> Thanks again guys.
>> -Tom
>>
>>
>>> since session expiration really is a rare catastrophic event. (or at least 
>>> it should be.) it is probably easiest to deal with it by starting with a 
>>> fresh instance if your session expires.
>>>
>>> ben
>>> ________________________________________
>>> From: Tom Nichols [tmnich...@gmail.com]
>>> Sent: Thursday, February 12, 2009 11:53 AM
>>> To: zookeeper-user@hadoop.apache.org
>>> Subject: Re: Dealing with session expired
>>>
>>> I'm using a timeout of 5000ms.  Now let me ask this:  Suppose all of
>>> my clients are waiting on some external event -- not ZooKeeper -- so
>>> they are all idle and are not touching ZK nodes, nor are they calling
>>> exists, getChildren, etc etc.  Can that idleness cause session expiry?
>>>
>>> I'm running a local quorum of 3 nodes.  That is, I have an Ant script
>>> that kicks off 3 <java> tasks in parallel to run ConsumerPeerMain,
>>> each with its own config file.
>>>
>>> Regarding handling of the failure, I suspect I will just have to
>>> reinitialize by creating a new instance of my client(s) that
>>> themselves will have a new ZK instance.  I'm using Spring to wire
>>> everything together, which is why it's particularly difficult to
>>> simply re-create a new ZK instance and pass it to the classes using it
>>> (those classes have no knowledge of each other).  But I _can_ just
>>> pull a freshly-created (prototype) instance from the Spring
>>> application context, which is where a new ZK client will be wired in.
>>>
>>> The only ramification there is I have to throw the KeeperException as
>>> a fatal exception rather than letting that client try to re-elect.  Or
>>> maybe add in some logic to say "if I can't re-elect, _then_ throw an
>>> exception and consider it fatal."
>>>
>>> Thanks guys.
>>>
>>> -Tom
>>>
>>>
>>> On Thu, Feb 12, 2009 at 2:39 PM, Patrick Hunt <ph...@apache.org> wrote:
>>>>
>>>> Regardless of frequency Tom's code still has to handle this situation.
>>>>
>>>> I would suggest that the "two classes" Tom is referring to in his mail, the
>>>> ones that use ZK client object, should either be able to "reinitialize" 
>>>> with
>>>> a new zk session, or they themselves should be discarded and new instances
>>>> created using the new session (not sure what makes more sense for his
>>>> archi...)
>>>>
>>>> Regardless of whether we reuse the session object or create a new one I
>>>> believe the code using the session needs to "reinitialize" in some way --
>>>> there's been a dramatic break from the cluster.
>>>>
>>>> As I mentioned, you can decrease the likelihood of expiration by increasing
>>>> the timeout - but the downside is that you are less sensitive to clients
>>>> dying (because their ephemeral nodes don't get deleted till close/expire 
>>>> and
>>>> if you are doing something like leader election among your clients it will
>>>> take longer for the followers to be notified).
>>>>
>>>> Patrick
>>>>
>>>> Mahadev Konar wrote:
>>>>>
>>>>> Hi Tom,
>>>>>  The session expired event means that the the server expired the client
>>>>> and
>>>>> that means the watches and ephemrals will go away for that node.
>>>>>
>>>>> How are you running your zookeeper quorum? Session expiry event should be
>>>>> really rare event . If you have a quorum of servers it should rarely
>>>>> happen.
>>>>>
>>>>> mahadev
>>>>>
>>>>>
>>>>> On 2/12/09 11:17 AM, "Tom Nichols" <tmnich...@gmail.com> wrote:
>>>>>
>>>>>> So if a session expires, my ephemeral nodes and watches have already
>>>>>> disappeared?  I suppose creating a new ZK instance with the old
>>>>>> session ID would not do me any good in that case.  Correct?
>>>>>>
>>>>>> Thanks.
>>>>>> -Tom
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Feb 12, 2009 at 2:12 PM, Mahadev Konar <maha...@yahoo-inc.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Tom,
>>>>>>>  We prefer to discard the zookeeper instance if a session expires.
>>>>>>> Maintaining a one to one relationship between a client handle and a
>>>>>>> session
>>>>>>> makes it much simpler for users to understand the existence and
>>>>>>> disappearance of ephemeral nodes and watches created by a zookeeper
>>>>>>> client.
>>>>>>>
>>>>>>> thanks
>>>>>>> mahadev
>>>>>>>
>>>>>>>
>>>>>>> On 2/12/09 10:58 AM, "Tom Nichols" <tmnich...@gmail.com> wrote:
>>>>>>>
>>>>>>>> I've come across the situation where a ZK instance will have an
>>>>>>>> expired connection and therefore all operations fail.  Now AFAIK the
>>>>>>>> only way to recover is to create  a new ZK instance with the old
>>>>>>>> session ID, correct?
>>>>>>>>
>>>>>>>> Now, my problem is, the ZK instance may be shared -- not between
>>>>>>>> threads -- but maybe two classes in the same thread synchronize on
>>>>>>>> different nodes by using different watchers.  So it makes sense that
>>>>>>>> one ZK client instance can handle this.  Except that even if I detect
>>>>>>>> the session expiration by catching the KeeperException, if I want to
>>>>>>>> "resume" the session, I have to create a new ZK instance and pass it
>>>>>>>> to any classes who were previously sharing the same instance.  Does
>>>>>>>> this make sense so far?
>>>>>>>>
>>>>>>>> Anyway, bottom line is, it would be nice if a ZK instance could itself
>>>>>>>> recover a session rather than discarding that instance and creating a
>>>>>>>> new one.
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>>>
>>>>>>>> Thanks in advance,
>>>>>>>>
>>>>>>>> -Tom
>

Reply via email to