Right, Marathon can't provide uniqueness guarantees. As you said, network partitions are really common in distributed systems and shouldn't be considered edge cases.
On Wed, Feb 24, 2016 at 8:49 AM, Petr Novak <[email protected]> wrote: > Thanks everybody for the great input. If I understand it correctly it > doesn't help in this case, it just blindly restart service somewhere else > once it looses heartbeat. Partition doesn't happen only because network > failure it can be as simple as JVM "stop the world" with large heap or > pretty much whatever. In cases when 2 and more potentially running services > can throw havoc on my cluster I have to implement advanced coordination > myself. It probably make sense because production level implementation for > Zoo has to use LeaderSelector and equivalent and the actual logic is > probably quite connected to business logic in service, e.g. when to ensure > that service is still leader at this exact moment right before performing > an action. Unsure if there is enough general use cases for a simple "racing > for a lock on service startup" generic implementation. > > Petr > >

