Re: auto-reconnection ZooKeeper proxy?

2008-07-22 Thread James Strachan
I've been experimenting with the WriteLock implementation to deal with
server failure; I've found that its maybe too simplistic creating a
reconnecting ZooKeeper proxy; instead I'm just making it easy to retry
operations (or arbitrary ZK code blocks) using a helper class
(currently called ProtocolSupport but am open to suggestions for a
better class name for a base class for higher level protocol
implementations).

Using the WriteLock as an example; it seems you often want the retry
logic to include a number of calls to ZooKeeper; (e.g. check if a
znode exists, if it doesn't try to create it - retrying the whole
thing when ZK exceptions like connection loss occur etc).

I'll submit the patch soon to ZOOKEEPER-78 including this...
https://issues.apache.org/jira/browse/ZOOKEEPER-78

One thing I have found is I've managed to get a
SessionExpiredException in my test case (not sure why though; I
thought ZooKeeper automatically kept sending keep alive pings?). I
just wondered what a client should do if that happens; I didn't see
any easy way to effectively disconnect and reconnect a ZooKeeper
client in this case.

I'm assuming that the SessionExpiredException is always gonna be
possible; so I've patched ZooKeeper to allow clients to handle a
SessionExpiredException and force a reconnection (to get a new
session).

So I've created a small patch to add a reconnect() method to ZooKeeper
which just closes and recreates the cnxn object...
https://issues.apache.org/jira/browse/ZOOKEEPER-84

(I also added a toString() method for easier debugging when running
test cases with multiple clients in the same jvm).

There's maybe a less drastic way to force the re-connection of a
ZooKeeper client; but I figured trashing and recreating the cnxn
object at least is lowest risk and a simple patch :) and the code
should only be executed rarely so performance isn't such an issue.

Thoughts?

2008/7/18 James Strachan [EMAIL PROTECTED]:
 background
 I work on the ActiveMQ project which implements the JMS API - which is
 a kinda complex thing but it involves a number of objects
 (Connections, Sessions, Producers, Consumers). In some JMS providers
 its the end users responsibility to deal with detecting a connection
 failure (from any other kind of error) and then automatically
 recreating all the dependent objects.

 We added support for auto-reconnection which greatly simplifies the
 developers life; it lets the JMS client automatically deal with any
 socket failures, reconnecting to a broker for you and re-establishing
 all of those in-flight operations (subscriptions, in progress sends
 and so forth).
 http://activemq.apache.org/how-can-i-support-auto-reconnection.html

 Having seen the value of wrapping up the auto-reconnection within a
 proxy; am thinking its also got merits on ZK
 /background


 As we start creating protocols/recipes that implement higher order
 features like locks, leader elections and so forth we could probably
 do with some kinda auto-reconnecting facade to ZooKeeper just to
 simplify the implementation code of protocols/recipes. Its a kinda
 complex area though and I'm sure different protocols will want
 different things; but even for something so simple as a lock - I can
 see the value in an auto-reconnecting proxy.

 e.g. there's already 5 different method calls in the current WriteLock
 implementation which all really need a custom try/catch around them to
 detect loss of the connection which then should be wrapped in a
 reconnect-retry logic.

 What to do about watches is interesting; though for now the current
 behaviour seems fine (fire them all forcing a re-watch) though we
 could though in the future re-enable watches in the new server
 connection as an option.

 All I'm thinking about for now is a kinda ReconnectingZooKeeper which
 looks like a ZooKeeper object but which internally catches dead
 connections and then internally tries to reconnect to one of the ZK
 servers under the covers - retrying the current read/write operation
 until the ReconnectPolicy says to fail. e.g. some folks might wanna
 retry connecting forever; others for a certain amount of time or
 certain number of attempts etc.

 So something like...

 public class ReconnectingZooKeeper extends ZooKeeper {
  ...
  // for each method that reads/writes synchronously
  public Stat exists(String path) {...
 boolean retry = true;
 for (int count = 0; retry; count++ ) {
   try {

  // really do the method call!
  return super.exists(path);

   } catch (ConnectionClosedException e) {

  // lets let any watches or listeners respond to connection
 loss first before we retry
  fireAnyWatchesAndStuff();

  if (!shouldRetry(count)) {
 throw e;
   }
   }
 }


 Any watches should fire when a connection is lost - and all writes
 should be replicated to the new server we connect to right? So I'm
 thinking, if we had a ReconnectingZooKeeper implementation, we could
 use it 

auto-reconnection ZooKeeper proxy?

2008-07-18 Thread James Strachan
background
I work on the ActiveMQ project which implements the JMS API - which is
a kinda complex thing but it involves a number of objects
(Connections, Sessions, Producers, Consumers). In some JMS providers
its the end users responsibility to deal with detecting a connection
failure (from any other kind of error) and then automatically
recreating all the dependent objects.

We added support for auto-reconnection which greatly simplifies the
developers life; it lets the JMS client automatically deal with any
socket failures, reconnecting to a broker for you and re-establishing
all of those in-flight operations (subscriptions, in progress sends
and so forth).
http://activemq.apache.org/how-can-i-support-auto-reconnection.html

Having seen the value of wrapping up the auto-reconnection within a
proxy; am thinking its also got merits on ZK
/background


As we start creating protocols/recipes that implement higher order
features like locks, leader elections and so forth we could probably
do with some kinda auto-reconnecting facade to ZooKeeper just to
simplify the implementation code of protocols/recipes. Its a kinda
complex area though and I'm sure different protocols will want
different things; but even for something so simple as a lock - I can
see the value in an auto-reconnecting proxy.

e.g. there's already 5 different method calls in the current WriteLock
implementation which all really need a custom try/catch around them to
detect loss of the connection which then should be wrapped in a
reconnect-retry logic.

What to do about watches is interesting; though for now the current
behaviour seems fine (fire them all forcing a re-watch) though we
could though in the future re-enable watches in the new server
connection as an option.

All I'm thinking about for now is a kinda ReconnectingZooKeeper which
looks like a ZooKeeper object but which internally catches dead
connections and then internally tries to reconnect to one of the ZK
servers under the covers - retrying the current read/write operation
until the ReconnectPolicy says to fail. e.g. some folks might wanna
retry connecting forever; others for a certain amount of time or
certain number of attempts etc.

So something like...

public class ReconnectingZooKeeper extends ZooKeeper {
  ...
  // for each method that reads/writes synchronously
  public Stat exists(String path) {...
 boolean retry = true;
 for (int count = 0; retry; count++ ) {
   try {

  // really do the method call!
  return super.exists(path);

   } catch (ConnectionClosedException e) {

  // lets let any watches or listeners respond to connection
loss first before we retry
  fireAnyWatchesAndStuff();

  if (!shouldRetry(count)) {
 throw e;
   }
   }
}


Any watches should fire when a connection is lost - and all writes
should be replicated to the new server we connect to right? So I'm
thinking, if we had a ReconnectingZooKeeper implementation, we could
use it with the current WriteLock implementation so that the protocol
could survive ZK server loss  reconnection while still working.

e.g. on connection loss the leader/lock owner needs to loose the lock
until it gets it back just in case; but other than that I think it
should work.

Am sure there's some gremlins somewhere in automatically reconnecting;
though provided the watch mechanism works, clients will be able to do
the right thing I think.

Thoughts?

-- 
James
---
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com