Thanks Ludwig, I understand that it's a rather bogus case, but it is possible (though improbable) for this to occur (corrupted disks, whatever). Maybe there's no way around it. I wasn't actually testing anything in particular in this case, I just happened to have executed this sequence of events and noticed that my clients didn't reconnect. cheers
On Wed, Nov 12, 2014 at 1:19 PM, Ludwig Pummer < [email protected]> wrote: > You are confounding your expired session testing with the wiping of all ZK > data behind the scene. > > I seem to recall something about the client refusing to reconnect to a ZK > server that does not have at least the Zxid it last saw. Though it's bad > for your particular case, it's a good thing in general since it prevents > the client from connecting to a quorum member which isn't up to date. > > > On 11/11/2014 6:08 PM, Cameron McKenzie wrote: > >> Guys, >> I have a (possibly somewhat contrived) issue relating to reconnection of a >> client to ZK after quorum has been lost, and data has been corrupted. >> >> Essentially this is what's happening: >> -Client connects to 3 node ZK cluster >> -Client writes some ephemeral zNodes etc. >> -All nodes in ZK cluster are shut down >> -Contents of data/version-2 directories are removed on each ZK instance >> (i.e. the acceptedEpoch, currentEpoch and all the snapshots and tran logs) >> -Restart the nodes in the ZK cluster >> >> At this point, the ZK cluster comes up fine, but the client will not >> automatically reconnect. Having stepped through the client code with a >> debugger it seems like the server just doesn't respond to the session >> initialisation request). These are the logs, which are repeated every >> second. Note that if I restart the client, everything's fine. >> >> 12:56:35.978 [main-SendThread(ubuntubox:2181)] INFO >> org.apache.zookeeper.ClientCnxn - Opening socket connection to server >> ubuntubox/192.168.56.102:2181. Will not attempt to authenticate using >> SASL >> (unknown error) >> 12:56:35.980 [main-SendThread(ubuntubox:2181)] INFO >> org.apache.zookeeper.ClientCnxn - Socket connection established to >> ubuntubox/192.168.56.102:2181, initiating session >> 12:56:35.983 [main-SendThread(ubuntubox:2181)] DEBUG >> org.apache.zookeeper.ClientCnxn - Session establishment request sent on >> ubuntubox/192.168.56.102:2181 >> 12:56:36.002 [main-SendThread(ubuntubox:2181)] INFO >> org.apache.zookeeper.ClientCnxn - Unable to read additional data from >> server sessionid 0x249a1b64cc90000, likely server has closed socket, >> closing socket connection and attempting reconnect >> 12:56:37.833 [main-SendThread(ubuntubox:2182)] INFO >> org.apache.zookeeper.ClientCnxn - Opening socket connection to server >> ubuntubox/192.168.56.102:2182. Will not attempt to authenticate using >> SASL >> (unknown error) >> 12:56:37.834 [main-SendThread(ubuntubox:2182)] INFO >> org.apache.zookeeper.ClientCnxn - Socket connection established to >> ubuntubox/192.168.56.102:2182, initiating session >> 12:56:37.835 [main-SendThread(ubuntubox:2182)] DEBUG >> org.apache.zookeeper.ClientCnxn - Session establishment request sent on >> ubuntubox/192.168.56.102:2182 >> 12:56:37.859 [main-SendThread(ubuntubox:2182)] INFO >> org.apache.zookeeper.ClientCnxn - Unable to read additional data from >> server sessionid 0x249a1b64cc90000, likely server has closed socket, >> closing socket connection and attempting reconnect >> 12:56:38.298 [main-SendThread(ubuntubox:2183)] INFO >> org.apache.zookeeper.ClientCnxn - Opening socket connection to server >> ubuntubox/192.168.56.102:2183. Will not attempt to authenticate using >> SASL >> (unknown error) >> 12:56:38.299 [main-SendThread(ubuntubox:2183)] INFO >> org.apache.zookeeper.ClientCnxn - Socket connection established to >> ubuntubox/192.168.56.102:2183, initiating session >> 12:56:38.300 [main-SendThread(ubuntubox:2183)] DEBUG >> org.apache.zookeeper.ClientCnxn - Session establishment request sent on >> ubuntubox/192.168.56.102:2183 >> >> Can someone explain what's going on? Is this a bug? While I understand >> that >> it's slightly contrived, the destruction of the data is certainly a >> possibility, and having to restart every client even when the cluster >> comes >> back up is not ideal. >> cheers >> Cam >> >> >
