I don't see the strong value here. A few failures would be detected more quickly, but I am not convinced that this would actually improve functionality significantly.
On Tue, Sep 10, 2013 at 1:01 PM, Jeremy Stribling <[email protected]> wrote: > Hi all, > > Let's assume that you wanted to deploy ZK in a virtualized environment, > despite all of the known drawbacks. Assume we could deploy it such that > the ZK servers were all using independent CPUs and storage (though not > dedicated disks). Obviously, the shared disks (shared with other, non-ZK > VMs on the same hypervisor) will cause ZK to hit the default session > timeout occasionally, so you would need to raise the existing session > timeout to something like 30 seconds. > > I'm curious if there would be any technical drawbacks to adding an > additional heartbeat mechanism between the clients and the servers, which > would have the goal of detecting network-only failures faster than the > existing heartbeat mechanism. The idea is that there would be a new thread > dedicated to processing these heartbeats, which would not get blocked on > I/O. Then the clients could configure a second, smaller timeout value, and > it would be assumed that any such timeout indicated a real problem. The > existing mechanism would still be in place to catch I/O-related errors. > > I understand the philosophy that there should be some heartbeat mechanism > that takes the disk into account, but I'm having trouble coming up with > technical reasons not to add a second mechanism. Obviously, the advantage > would be that the clients could detect network failures and system crashes > more quickly in an environment with slow disks, and fail over to other > servers more quickly. The only disadvantages I can come up with are: > > 1) More code complexity, and slightly more heartbeat traffic on the wire > 2) I think the servers have to log session expirations to disk, so if the > sessions expire at a faster rate than the disk can handle, it might lead to > a large backlog. > > Are there other drawbacks I am missing? Would a patch that added > something like this be considered, or is it dead from the start? Thanks, > > Jeremy > >
