2011/4/14 Chang Song <[email protected]>

> You need to understand that most app can tolerate delay in connect/close,
>
but we cannot tolerate ping delay since we are using ZK heartbeat TO
> for sole failure detection.
>

What about using multiple ZK clusters for this, then?

But it really sounds like your ZK machines are misconfigured somehow.
 Session start/stop isn't any more
expensive than znode updates and a small ZK cluster can handle tens of
thousands of those per second if
set up correctly.

Have you tested a cluster where the machines are set up correctly with
separate snapshot and log disks?

Are your ZK machines doing any other tasks?


> We use 15 seconds (5 sec for each ensemble)
> for session timeout, important server will drop out of the clusters even
> if the server is not malfunctioning, in some cases, it wreaks havoc on
> certain
> services.
>

Reply via email to