On Sun, Jul 18, 2010 at 3:34 PM, Ashwin Jayaprakash <
> - If 1 out of 3 servers crashes and the log files are unrecoverable, how
> do we provision a replacement server?
Just start it and it will download a snapshot from the other servers.
> - If the server log is recoverable but provisioning takes a long time,
> then what happens if the old log file is far behind the current state?
If a server is very far behind, it will download a snapshot as if it knows
nothing. This rarely takes long.
> - If there was a temporary glitch (n/w or GC) and the replica to which
> the client is connected breaks away from the quorum does the client
> notified? Does it stop processing client requests? Does it rejoin the
> cluster without manual intervention?
Failures like this are normally invisible to the client.
> - Do the servers really have to run with file based persistence? I saw
> that someone wanted this in-memory mode for unit testing (ZK
> but there are cases where only a transient ZK service is needed. Most
> enterprise systems have replicated Databases anyway. So, the fear of data
> loss is minimal. If ZK logs are the only means of recovery, then this
> be harder to implement
ZK is not a replacement for your database and it is really, really nice to
be able to stop it and start it again. Disk persistence helps with this
promising. Plain ZK API is a bit overwhelming :)
In practice, it is really pretty simple. Try it out.