ZK recovery questions

2010-07-18 Thread Ashwin Jayaprakash
Hi, I've been reading the docs and trying out some basic Zookeeper examples.
I have a few simple questions related to recovery.

It would be good to have questions like these on the Wiki/docs to avoid
noobs like me asking the same thing over and over.


   - If 1 out of 3 servers crashes and the log files are unrecoverable, how
   do we provision a replacement server?


   - If the server log is recoverable but provisioning takes a long time,
   then what happens if the old log file is far behind the current state? The
   docs say that recovery is based on fuzzy check pointing and snapshots but I
   wasn't clear as to how long catching up would take


   - What happens at the client side code if a server quorum is lost? Does
   the ZK service freeze or continue to service just reads?
  - If there was a temporary glitch (n/w or GC) and the replica to which
  the client is connected breaks away from the quorum does the client get
  notified? Does it stop processing client requests? Does it rejoin the
  cluster without manual intervention?
  - Now if even the client cannot connect to other servers (split brain)
  .. ... well I suppose this question is moot


   - Do the servers really have to run with file based persistence? I saw
   that someone wanted this in-memory mode for unit testing (ZK
694https://issues.apache.org/jira/browse/ZOOKEEPER-694)
   but there are cases where only a transient ZK service is needed. Most
   enterprise systems have replicated Databases anyway. So, the fear of data
   loss is minimal. If ZK logs are the only means of recovery, then this might
   be harder to implement


   - A client example with full fledged error handling would be very useful
   for starters. I'm not sure if http://github.com/sgroschupf/zkclient and
   http://code.google.com/p/cages/ have everything but they do look
   promising. Plain ZK API is a bit overwhelming :)


Thanks,
Ashwin.


Re: ZK recovery questions

2010-07-18 Thread Ted Dunning
On Sun, Jul 18, 2010 at 3:34 PM, Ashwin Jayaprakash 
ashwin.jayaprak...@gmail.com wrote:


   - If 1 out of 3 servers crashes and the log files are unrecoverable, how
   do we provision a replacement server?


Just start it and it will download a snapshot from the other servers.



- If the server log is recoverable but provisioning takes a long time,
   then what happens if the old log file is far behind the current state?


If a server is very far behind, it will download a snapshot as if it knows
nothing.  This rarely takes long.


  - If there was a temporary glitch (n/w or GC) and the replica to which
  the client is connected breaks away from the quorum does the client
 get
  notified? Does it stop processing client requests? Does it rejoin the
  cluster without manual intervention?


Failures like this are normally invisible to the client.


   - Do the servers really have to run with file based persistence? I saw
   that someone wanted this in-memory mode for unit testing (ZK
 694https://issues.apache.org/jira/browse/ZOOKEEPER-694)
   but there are cases where only a transient ZK service is needed. Most
   enterprise systems have replicated Databases anyway. So, the fear of data
   loss is minimal. If ZK logs are the only means of recovery, then this
 might
   be harder to implement


ZK is not a replacement for your database and it is really, really nice to
be able to stop it and start it again.  Disk persistence helps with this
enormously.

  promising. Plain ZK API is a bit overwhelming :)


In practice, it is really pretty simple.  Try it out.