Greetings, As some of you already know, we've been using ZooKeeper at Canonical for a project we've been pushing (Ensemble, http://j.mp/dql6Fu). We've already written down txzookeeper (http://j.mp/d3Zx7z), to integrate the Python bindings with Twisted, and we're also in the process of creating a Go binding for the C ZooKeeper library (to be released soon).
Yesterday, while working on the Go bindings, a test made me wonder about what's the correct way to reestablish a session with ZooKeeper. In another thread a couple of months ago, Ben mentioned: > i'm a bit skeptical that this is going to work out properly. a server may > receive a socket reset even though the client is still alive: > > 1) client sends a request to a server > 2) client is partitioned from the server > 3) server starts trying to send response > 4) client reconnects to a different server > 5) partition heals > 6) server gets a reset from client > > at step 6 i don't think you want to delete the ephemeral nodes. I also don't think it should delete ephemeral nodes. While performing some tests, though, I noticed that something similar to this may happen. The following sequence was performed in the test: 1) Establish connection A to ZK 2) Create an ephemeral node with A 3) Establish connection B to ZK, reusing the session from A 4) Close connection A 5) The ephemeral node from (2) got deleted. So, this made me wonder about what's the proper way to reestablish a session in practice, due to partitioning. Imagine that the reconnection which happened on (3) was an attempt from the client to restore the communication with the ZK cluster when faced with partitioning. Once the connection succeeded, the old resources from connection A should be disposed, but how to do this without risking killing the healthy connection on B (imagine that the network comes back between (3) and (4)). Anyone has thoughts on that? -- Gustavo Niemeyer http://niemeyer.net http://niemeyer.net/blog http://niemeyer.net/twitter