Thanks everybody for the great input. If I understand it correctly it doesn't help in this case, it just blindly restart service somewhere else once it looses heartbeat. Partition doesn't happen only because network failure it can be as simple as JVM "stop the world" with large heap or pretty much whatever. In cases when 2 and more potentially running services can throw havoc on my cluster I have to implement advanced coordination myself. It probably make sense because production level implementation for Zoo has to use LeaderSelector and equivalent and the actual logic is probably quite connected to business logic in service, e.g. when to ensure that service is still leader at this exact moment right before performing an action. Unsure if there is enough general use cases for a simple "racing for a lock on service startup" generic implementation.
Petr