On Sun, Jul 18, 2010 at 3:34 PM, Ashwin Jayaprakash <
ashwin.jayaprak...@gmail.com> wrote:

>
>   - If 1 out of 3 servers crashes and the log files are unrecoverable, how
>   do we provision a replacement server?
>

Just start it and it will download a snapshot from the other servers.


>
>    - If the server log is recoverable but provisioning takes a long time,
>   then what happens if the old log file is far behind the current state?


If a server is very far behind, it will download a snapshot as if it knows
nothing.  This rarely takes long.


>      - If there was a temporary glitch (n/w or GC) and the replica to which
>      the client is connected breaks away from the quorum does the client
> get
>      notified? Does it stop processing client requests? Does it rejoin the
>      cluster without manual intervention?
>

Failures like this are normally invisible to the client.


>   - Do the servers really have to run with file based persistence? I saw
>   that someone wanted this in-memory mode for unit testing (ZK
> 694<https://issues.apache.org/jira/browse/ZOOKEEPER-694>)
>   but there are cases where only a transient ZK service is needed. Most
>   enterprise systems have replicated Databases anyway. So, the fear of data
>   loss is minimal. If ZK logs are the only means of recovery, then this
> might
>   be harder to implement
>

ZK is not a replacement for your database and it is really, really nice to
be able to stop it and start it again.  Disk persistence helps with this
enormously.

  promising. Plain ZK API is a bit overwhelming :)
>

In practice, it is really pretty simple.  Try it out.

Reply via email to