Thanks everybody for the great input. If I understand it correctly it
doesn't help in this case, it just blindly restart service somewhere else
once it looses heartbeat. Partition doesn't happen only because network
failure it can be as simple as JVM "stop the world" with large heap or
pretty much whatever. In cases when 2 and more potentially running services
can throw havoc on my cluster I have to implement advanced coordination
myself. It probably make sense because production level implementation for
Zoo has to use LeaderSelector and equivalent and the actual logic is
probably quite connected to business logic in service, e.g. when to ensure
that service is still leader at this exact moment right before performing
an action. Unsure if there is enough general use cases for a simple "racing
for a lock on service startup" generic implementation.

Petr

Reply via email to