The way I would do it in a production cluster would be *not* to use
directly IP addresses for the ZK ensemble, but instead rely on some form of
internal DNS and use internally-resolvable hostnames (eg, {zk1, zk2, ...}.
prod.example.com etc) and have the provisioning tooling (Chef, Puppet,
Ansible, what have you) handle the setting of the hostname when
restarting/replacing a failing/crashed ZK server.This way your list of zk's to Mesos never changes, even though the FQN's will map to different IPs / VMs. Obviously, this may not be always desirable / feasible (eg, if your prod environment does not support DNS resolution). You are correct in that Mesos does not currently support dynamically changing the ZK's addresses, but I don't know whether that's a limitation of Mesos code or of the ZK C++ client driver. I'll look into it and let you know what I find (if anything). -- *Marco Massenzio* Distributed Systems Engineer http://codetrips.com On Mon, Nov 9, 2015 at 6:01 AM, Donald Laidlaw <[email protected]> wrote: > How do mesos masters and slaves react to zookeeper cluster changes? When > the masters and slaves start they are given a set of addresses to connect > to zookeeper. But over time, one of those zookeepers fails, and is replaced > by a new server at a new address. How should this be handled in the mesos > servers? > > I am guessing that mesos does not automatically detect and react to that > change. But obviously we should do something to keep the mesos servers > happy as well. What should be do? > > The obvious thing is to stop the mesos servers, one at a time, and restart > them with the new configuration. But it would be really nice to be able to > do this dynamically without restarting the server. After all, coordinating > a rolling restart is a fairly hard job. > > Any suggestions or pointers? > > Best regards, > Don Laidlaw > > >

