Right, Marathon can't provide uniqueness guarantees. As you said, network
partitions are really common in distributed systems and shouldn't be
considered edge cases.

On Wed, Feb 24, 2016 at 8:49 AM, Petr Novak <[email protected]> wrote:

> Thanks everybody for the great input. If I understand it correctly it
> doesn't help in this case, it just blindly restart service somewhere else
> once it looses heartbeat. Partition doesn't happen only because network
> failure it can be as simple as JVM "stop the world" with large heap or
> pretty much whatever. In cases when 2 and more potentially running services
> can throw havoc on my cluster I have to implement advanced coordination
> myself. It probably make sense because production level implementation for
> Zoo has to use LeaderSelector and equivalent and the actual logic is
> probably quite connected to business logic in service, e.g. when to ensure
> that service is still leader at this exact moment right before performing
> an action. Unsure if there is enough general use cases for a simple "racing
> for a lock on service startup" generic implementation.
>
> Petr
>
>

Reply via email to