Thanks for bringing this up! This is part of the ZK C library. We have seen
failing slaves with sporadic DNS lookup failures in our clusters.

After speaking to a ZK expert, I believe one of the things going into 3.5.0
is the ability to only need to resolve one of the zk hosts correctly, as
you said:
https://issues.apache.org/jira/browse/ZOOKEEPER-107

 But I'm unfamiliar with the details of that ticket what they ended up
going forward with after all of the discussion.


On Mon, Jul 28, 2014 at 11:07 PM, Itamar Ostricher <[email protected]>
wrote:

> Hi,
>
> I experimented today running mesos masters & slaves with multiple masters
> using zookeeper, by editing the /etc/mesos/zk file on all nodes (masters
> and slaves) to something like:
> zk://master1:2181,master2:2181,master3:2181/mesos
>
> I noticed that if not all masters are up when a master or slave mesos
> service is started, I get an error of the form:
>
> F0729 05:45:55.244169  2019 zookeeper.cpp:103] Failed to create ZooKeeper,
> zookeeper_init: No such file or directory [2]
> Googling the error I found a previous related thread [1], in which Thomas
> says that this happens when zookeeper is unable to resolve one of the
> hostnames.
> Indeed, when I changed the zk string to contain only masters that are up,
> it worked fine.
>
> My question is, how can this be a requirement? (and why?)
> The whole point of zookeeper is to allow high-availability when some of
> the masters are down, so naturally in such cases their hostnames will not
> be resolved...
> Is this something that occurs in mesos itself, or something in zookeeper?
>
> [1]
> http://mail-archives.apache.org/mod_mbox/mesos-user/201404.mbox/%3ccajrb3thcjbhd1bqjb0oevkqpawmst9-yxaqwrqo9rgft45x...@mail.gmail.com%3E
>

Reply via email to