Dealing with session expired

2009-02-12 Thread Tom Nichols
I've come across the situation where a ZK instance will have an
expired connection and therefore all operations fail.  Now AFAIK the
only way to recover is to create  a new ZK instance with the old
session ID, correct?

Now, my problem is, the ZK instance may be shared -- not between
threads -- but maybe two classes in the same thread synchronize on
different nodes by using different watchers.  So it makes sense that
one ZK client instance can handle this.  Except that even if I detect
the session expiration by catching the KeeperException, if I want to
resume the session, I have to create a new ZK instance and pass it
to any classes who were previously sharing the same instance.  Does
this make sense so far?

Anyway, bottom line is, it would be nice if a ZK instance could itself
recover a session rather than discarding that instance and creating a
new one.

Thoughts?

Thanks in advance,

-Tom


Re: Dealing with session expired

2009-02-12 Thread Mahadev Konar
Hi Tom,
  We prefer to discard the zookeeper instance if a session expires.
Maintaining a one to one relationship between a client handle and a session
makes it much simpler for users to understand the existence and
disappearance of ephemeral nodes and watches created by a zookeeper client.

thanks
mahadev


On 2/12/09 10:58 AM, Tom Nichols tmnich...@gmail.com wrote:

 I've come across the situation where a ZK instance will have an
 expired connection and therefore all operations fail.  Now AFAIK the
 only way to recover is to create  a new ZK instance with the old
 session ID, correct?
 
 Now, my problem is, the ZK instance may be shared -- not between
 threads -- but maybe two classes in the same thread synchronize on
 different nodes by using different watchers.  So it makes sense that
 one ZK client instance can handle this.  Except that even if I detect
 the session expiration by catching the KeeperException, if I want to
 resume the session, I have to create a new ZK instance and pass it
 to any classes who were previously sharing the same instance.  Does
 this make sense so far?
 
 Anyway, bottom line is, it would be nice if a ZK instance could itself
 recover a session rather than discarding that instance and creating a
 new one.
 
 Thoughts?
 
 Thanks in advance,
 
 -Tom



Re: Dealing with session expired

2009-02-12 Thread Tom Nichols
So if a session expires, my ephemeral nodes and watches have already
disappeared?  I suppose creating a new ZK instance with the old
session ID would not do me any good in that case.  Correct?

Thanks.
-Tom



On Thu, Feb 12, 2009 at 2:12 PM, Mahadev Konar maha...@yahoo-inc.com wrote:
 Hi Tom,
  We prefer to discard the zookeeper instance if a session expires.
 Maintaining a one to one relationship between a client handle and a session
 makes it much simpler for users to understand the existence and
 disappearance of ephemeral nodes and watches created by a zookeeper client.

 thanks
 mahadev


 On 2/12/09 10:58 AM, Tom Nichols tmnich...@gmail.com wrote:

 I've come across the situation where a ZK instance will have an
 expired connection and therefore all operations fail.  Now AFAIK the
 only way to recover is to create  a new ZK instance with the old
 session ID, correct?

 Now, my problem is, the ZK instance may be shared -- not between
 threads -- but maybe two classes in the same thread synchronize on
 different nodes by using different watchers.  So it makes sense that
 one ZK client instance can handle this.  Except that even if I detect
 the session expiration by catching the KeeperException, if I want to
 resume the session, I have to create a new ZK instance and pass it
 to any classes who were previously sharing the same instance.  Does
 this make sense so far?

 Anyway, bottom line is, it would be nice if a ZK instance could itself
 recover a session rather than discarding that instance and creating a
 new one.

 Thoughts?

 Thanks in advance,

 -Tom




Re: Dealing with session expired

2009-02-12 Thread Mahadev Konar
Hi Tom,
 The session expired event means that the the server expired the client and
that means the watches and ephemrals will go away for that node.

How are you running your zookeeper quorum? Session expiry event should be
really rare event . If you have a quorum of servers it should rarely happen.

mahadev


On 2/12/09 11:17 AM, Tom Nichols tmnich...@gmail.com wrote:

 So if a session expires, my ephemeral nodes and watches have already
 disappeared?  I suppose creating a new ZK instance with the old
 session ID would not do me any good in that case.  Correct?
 
 Thanks.
 -Tom
 
 
 
 On Thu, Feb 12, 2009 at 2:12 PM, Mahadev Konar maha...@yahoo-inc.com wrote:
 Hi Tom,
  We prefer to discard the zookeeper instance if a session expires.
 Maintaining a one to one relationship between a client handle and a session
 makes it much simpler for users to understand the existence and
 disappearance of ephemeral nodes and watches created by a zookeeper client.
 
 thanks
 mahadev
 
 
 On 2/12/09 10:58 AM, Tom Nichols tmnich...@gmail.com wrote:
 
 I've come across the situation where a ZK instance will have an
 expired connection and therefore all operations fail.  Now AFAIK the
 only way to recover is to create  a new ZK instance with the old
 session ID, correct?
 
 Now, my problem is, the ZK instance may be shared -- not between
 threads -- but maybe two classes in the same thread synchronize on
 different nodes by using different watchers.  So it makes sense that
 one ZK client instance can handle this.  Except that even if I detect
 the session expiration by catching the KeeperException, if I want to
 resume the session, I have to create a new ZK instance and pass it
 to any classes who were previously sharing the same instance.  Does
 this make sense so far?
 
 Anyway, bottom line is, it would be nice if a ZK instance could itself
 recover a session rather than discarding that instance and creating a
 new one.
 
 Thoughts?
 
 Thanks in advance,
 
 -Tom
 
 



Re: Dealing with session expired

2009-02-12 Thread Patrick Hunt
Ephemerals and watches are maintained across disconnect/reconnect btw 
the client and server however session expiration (or closing the session 
explicitly) will trigger deletion of ephemeral nodes associated with the 
session.


Right - once the session is expired the id is invalid. You need to 
create a new session (new id).


Btw, the timeout value you provide to when constructing the zookeeper 
client session directly effects the session expiration - the server uses 
this timeout as the session expiration time.


Patrick

Tom Nichols wrote:

So if a session expires, my ephemeral nodes and watches have already
disappeared?  I suppose creating a new ZK instance with the old
session ID would not do me any good in that case.  Correct?

Thanks.
-Tom



On Thu, Feb 12, 2009 at 2:12 PM, Mahadev Konar maha...@yahoo-inc.com wrote:

Hi Tom,
 We prefer to discard the zookeeper instance if a session expires.
Maintaining a one to one relationship between a client handle and a session
makes it much simpler for users to understand the existence and
disappearance of ephemeral nodes and watches created by a zookeeper client.

thanks
mahadev


On 2/12/09 10:58 AM, Tom Nichols tmnich...@gmail.com wrote:


I've come across the situation where a ZK instance will have an
expired connection and therefore all operations fail.  Now AFAIK the
only way to recover is to create  a new ZK instance with the old
session ID, correct?

Now, my problem is, the ZK instance may be shared -- not between
threads -- but maybe two classes in the same thread synchronize on
different nodes by using different watchers.  So it makes sense that
one ZK client instance can handle this.  Except that even if I detect
the session expiration by catching the KeeperException, if I want to
resume the session, I have to create a new ZK instance and pass it
to any classes who were previously sharing the same instance.  Does
this make sense so far?

Anyway, bottom line is, it would be nice if a ZK instance could itself
recover a session rather than discarding that instance and creating a
new one.

Thoughts?

Thanks in advance,

-Tom




Re: Dealing with session expired

2009-02-12 Thread Patrick Hunt

Regardless of frequency Tom's code still has to handle this situation.

I would suggest that the two classes Tom is referring to in his mail, 
the ones that use ZK client object, should either be able to 
reinitialize with a new zk session, or they themselves should be 
discarded and new instances created using the new session (not sure what 
makes more sense for his archi...)


Regardless of whether we reuse the session object or create a new one I 
believe the code using the session needs to reinitialize in some way 
-- there's been a dramatic break from the cluster.


As I mentioned, you can decrease the likelihood of expiration by 
increasing the timeout - but the downside is that you are less sensitive 
to clients dying (because their ephemeral nodes don't get deleted till 
close/expire and if you are doing something like leader election among 
your clients it will take longer for the followers to be notified).


Patrick

Mahadev Konar wrote:

Hi Tom,
 The session expired event means that the the server expired the client and
that means the watches and ephemrals will go away for that node.

How are you running your zookeeper quorum? Session expiry event should be
really rare event . If you have a quorum of servers it should rarely happen.

mahadev


On 2/12/09 11:17 AM, Tom Nichols tmnich...@gmail.com wrote:


So if a session expires, my ephemeral nodes and watches have already
disappeared?  I suppose creating a new ZK instance with the old
session ID would not do me any good in that case.  Correct?

Thanks.
-Tom



On Thu, Feb 12, 2009 at 2:12 PM, Mahadev Konar maha...@yahoo-inc.com wrote:

Hi Tom,
 We prefer to discard the zookeeper instance if a session expires.
Maintaining a one to one relationship between a client handle and a session
makes it much simpler for users to understand the existence and
disappearance of ephemeral nodes and watches created by a zookeeper client.

thanks
mahadev


On 2/12/09 10:58 AM, Tom Nichols tmnich...@gmail.com wrote:


I've come across the situation where a ZK instance will have an
expired connection and therefore all operations fail.  Now AFAIK the
only way to recover is to create  a new ZK instance with the old
session ID, correct?

Now, my problem is, the ZK instance may be shared -- not between
threads -- but maybe two classes in the same thread synchronize on
different nodes by using different watchers.  So it makes sense that
one ZK client instance can handle this.  Except that even if I detect
the session expiration by catching the KeeperException, if I want to
resume the session, I have to create a new ZK instance and pass it
to any classes who were previously sharing the same instance.  Does
this make sense so far?

Anyway, bottom line is, it would be nice if a ZK instance could itself
recover a session rather than discarding that instance and creating a
new one.

Thoughts?

Thanks in advance,

-Tom






RE: Dealing with session expired

2009-02-12 Thread Benjamin Reed
idleness is not a problem. the client library sends heartbeats to keep the 
session alive. the client library will also handle reconnects automatically if 
a server dies.

since session expiration really is a rare catastrophic event. (or at least it 
should be.) it is probably easiest to deal with it by starting with a fresh 
instance if your session expires.

ben

From: Tom Nichols [tmnich...@gmail.com]
Sent: Thursday, February 12, 2009 11:53 AM
To: zookeeper-user@hadoop.apache.org
Subject: Re: Dealing with session expired

I'm using a timeout of 5000ms.  Now let me ask this:  Suppose all of
my clients are waiting on some external event -- not ZooKeeper -- so
they are all idle and are not touching ZK nodes, nor are they calling
exists, getChildren, etc etc.  Can that idleness cause session expiry?

I'm running a local quorum of 3 nodes.  That is, I have an Ant script
that kicks off 3 java tasks in parallel to run ConsumerPeerMain,
each with its own config file.

Regarding handling of the failure, I suspect I will just have to
reinitialize by creating a new instance of my client(s) that
themselves will have a new ZK instance.  I'm using Spring to wire
everything together, which is why it's particularly difficult to
simply re-create a new ZK instance and pass it to the classes using it
(those classes have no knowledge of each other).  But I _can_ just
pull a freshly-created (prototype) instance from the Spring
application context, which is where a new ZK client will be wired in.

The only ramification there is I have to throw the KeeperException as
a fatal exception rather than letting that client try to re-elect.  Or
maybe add in some logic to say if I can't re-elect, _then_ throw an
exception and consider it fatal.

Thanks guys.

-Tom


On Thu, Feb 12, 2009 at 2:39 PM, Patrick Hunt ph...@apache.org wrote:
 Regardless of frequency Tom's code still has to handle this situation.

 I would suggest that the two classes Tom is referring to in his mail, the
 ones that use ZK client object, should either be able to reinitialize with
 a new zk session, or they themselves should be discarded and new instances
 created using the new session (not sure what makes more sense for his
 archi...)

 Regardless of whether we reuse the session object or create a new one I
 believe the code using the session needs to reinitialize in some way --
 there's been a dramatic break from the cluster.

 As I mentioned, you can decrease the likelihood of expiration by increasing
 the timeout - but the downside is that you are less sensitive to clients
 dying (because their ephemeral nodes don't get deleted till close/expire and
 if you are doing something like leader election among your clients it will
 take longer for the followers to be notified).

 Patrick

 Mahadev Konar wrote:

 Hi Tom,
  The session expired event means that the the server expired the client
 and
 that means the watches and ephemrals will go away for that node.

 How are you running your zookeeper quorum? Session expiry event should be
 really rare event . If you have a quorum of servers it should rarely
 happen.

 mahadev


 On 2/12/09 11:17 AM, Tom Nichols tmnich...@gmail.com wrote:

 So if a session expires, my ephemeral nodes and watches have already
 disappeared?  I suppose creating a new ZK instance with the old
 session ID would not do me any good in that case.  Correct?

 Thanks.
 -Tom



 On Thu, Feb 12, 2009 at 2:12 PM, Mahadev Konar maha...@yahoo-inc.com
 wrote:

 Hi Tom,
  We prefer to discard the zookeeper instance if a session expires.
 Maintaining a one to one relationship between a client handle and a
 session
 makes it much simpler for users to understand the existence and
 disappearance of ephemeral nodes and watches created by a zookeeper
 client.

 thanks
 mahadev


 On 2/12/09 10:58 AM, Tom Nichols tmnich...@gmail.com wrote:

 I've come across the situation where a ZK instance will have an
 expired connection and therefore all operations fail.  Now AFAIK the
 only way to recover is to create  a new ZK instance with the old
 session ID, correct?

 Now, my problem is, the ZK instance may be shared -- not between
 threads -- but maybe two classes in the same thread synchronize on
 different nodes by using different watchers.  So it makes sense that
 one ZK client instance can handle this.  Except that even if I detect
 the session expiration by catching the KeeperException, if I want to
 resume the session, I have to create a new ZK instance and pass it
 to any classes who were previously sharing the same instance.  Does
 this make sense so far?

 Anyway, bottom line is, it would be nice if a ZK instance could itself
 recover a session rather than discarding that instance and creating a
 new one.

 Thoughts?

 Thanks in advance,

 -Tom





Hadoop User Group Meeting (Bay Area) 2/18

2009-02-12 Thread Ajay Anand
The next Bay Area Hadoop User Group meeting is scheduled for Wednesday,
February 18th at Yahoo! 2811 Mission College Blvd, Santa Clara, Building
2, Training Rooms 5  6 from 6:00-7:30 pm.

 

Agenda:

Fair Scheduler for Hadoop - Matei Zaharia

Interfacing with MySQL - Aaron Kimball

 

Registration: http://upcoming.yahoo.com/event/1776616/

 

As always, suggestions for topics for future meetings are welcome.
Please send them to me directly at aan...@yahoo-inc.com

 

Look forward to seeing you there!

Ajay

 



Re: Dealing with session expired

2009-02-12 Thread Patrick Hunt
Tom, you might try changing the log4j default log level to DEBUG for the 
rootlogger and appender if you have not already done so (servers and 
clients both). You'll get more information to aid debugging if it does 
occur again.

http://hadoop.apache.org/zookeeper/docs/r3.0.1/zookeeperAdmin.html#sc_logging

Also, are you seeing timeouts on the client, or just session expiration 
on the server?


The stat command, detailed here, may also be of use to you:
http://hadoop.apache.org/zookeeper/docs/r3.0.1/zookeeperAdmin.html#sc_zkCommands

Knowing more about your env, OS  java version in particular, would also 
help us help you narrow things down. :-)


Patrick

Tom Nichols wrote:

On Thu, Feb 12, 2009 at 4:11 PM, Benjamin Reed br...@yahoo-inc.com wrote:

idleness is not a problem. the client library sends heartbeats to keep the 
session alive. the client library will also handle reconnects automatically if 
a server dies.


That's odd then that I'm seeing this problem.  I have a local, 3-node
zookeeper quorum, and I have 3 instances of the client also running on
the same box.  The session expiry doesn't seem to be in response to
any severe load on the machine or anything like that.  I'll keep an
eye on it and see if I can't reproduce the behavior in a distributed
environment.

I've realized a relatively easy way to deal with this problem -- I can
let my thread throw a fatal unchecked exception and then use a
ThreadGroup implementation that catches the exception.  This in turn
spawns a new client thread and adds it back to the same threadGroup.

Thanks again guys.
-Tom



since session expiration really is a rare catastrophic event. (or at least it 
should be.) it is probably easiest to deal with it by starting with a fresh 
instance if your session expires.

ben

From: Tom Nichols [tmnich...@gmail.com]
Sent: Thursday, February 12, 2009 11:53 AM
To: zookeeper-user@hadoop.apache.org
Subject: Re: Dealing with session expired

I'm using a timeout of 5000ms.  Now let me ask this:  Suppose all of
my clients are waiting on some external event -- not ZooKeeper -- so
they are all idle and are not touching ZK nodes, nor are they calling
exists, getChildren, etc etc.  Can that idleness cause session expiry?

I'm running a local quorum of 3 nodes.  That is, I have an Ant script
that kicks off 3 java tasks in parallel to run ConsumerPeerMain,
each with its own config file.

Regarding handling of the failure, I suspect I will just have to
reinitialize by creating a new instance of my client(s) that
themselves will have a new ZK instance.  I'm using Spring to wire
everything together, which is why it's particularly difficult to
simply re-create a new ZK instance and pass it to the classes using it
(those classes have no knowledge of each other).  But I _can_ just
pull a freshly-created (prototype) instance from the Spring
application context, which is where a new ZK client will be wired in.

The only ramification there is I have to throw the KeeperException as
a fatal exception rather than letting that client try to re-elect.  Or
maybe add in some logic to say if I can't re-elect, _then_ throw an
exception and consider it fatal.

Thanks guys.

-Tom


On Thu, Feb 12, 2009 at 2:39 PM, Patrick Hunt ph...@apache.org wrote:

Regardless of frequency Tom's code still has to handle this situation.

I would suggest that the two classes Tom is referring to in his mail, the
ones that use ZK client object, should either be able to reinitialize with
a new zk session, or they themselves should be discarded and new instances
created using the new session (not sure what makes more sense for his
archi...)

Regardless of whether we reuse the session object or create a new one I
believe the code using the session needs to reinitialize in some way --
there's been a dramatic break from the cluster.

As I mentioned, you can decrease the likelihood of expiration by increasing
the timeout - but the downside is that you are less sensitive to clients
dying (because their ephemeral nodes don't get deleted till close/expire and
if you are doing something like leader election among your clients it will
take longer for the followers to be notified).

Patrick

Mahadev Konar wrote:

Hi Tom,
 The session expired event means that the the server expired the client
and
that means the watches and ephemrals will go away for that node.

How are you running your zookeeper quorum? Session expiry event should be
really rare event . If you have a quorum of servers it should rarely
happen.

mahadev


On 2/12/09 11:17 AM, Tom Nichols tmnich...@gmail.com wrote:


So if a session expires, my ephemeral nodes and watches have already
disappeared?  I suppose creating a new ZK instance with the old
session ID would not do me any good in that case.  Correct?

Thanks.
-Tom



On Thu, Feb 12, 2009 at 2:12 PM, Mahadev Konar maha...@yahoo-inc.com
wrote:

Hi Tom,
 We prefer to discard the zookeeper instance if a session expires.
Maintaining