SyncDisconnected can occur for a variety of reasons. It's in the class of recoverable errors. Your app needs to go into a waiting state until SysConnected is retrieved again or SessionExpired. Have you read http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling ?
You should consider using one of the high level ZooKeeper frameworks (such as Curator which I wrote). -Jordan On Mar 13, 2013, at 2:01 PM, Ivan Kelly <[email protected]> wrote: > Hi guys, > > We have a usecase here where zookeeper is used to coordinate ownership > of partitions of a resource. When one server dies, the partition > should be moved to another server, etc. The action we need to take on > SessionExpired is very clear. We just kill the server. > > However it is unclear what we should do on a SyncDisconnected. We > can't just kill our server, as it may have just been one zookeeper > server failing. If we block all client requests to our server while we > wait for SyncConnected, we may block forever in the case that our > server is partitioned away from the zk cluster. If we continue to > serve requests, we risk split brain[1]. > > What have people done in the past to resolve issues like this? > > -Ivan > > > [1] This is a risk anyhow without proper fencing, but a limited amount > is ok in our application.
