Hello all, I’m new to Mesos but have recently started trying to stand up a cluster using BOSH. There is a BOSH release for it at https://github.com/cf-platform-eng/mesos-boshrelease that is under active development.
I was able to successfully deploy the cluster, however the slaves are not communicating with the master. Upon investigation I found that the leader election is happening properly with ZooKeeper. For this test I only have 1 Mesos master, 3 Mesos slaves, and 1 ZooKeeper instance for this test. All are running on their own VMs. The single master gets elected upon startup: I0224 21:20:40.716702 12024 contender.cpp:243] New candidate (id='0') has entered the contest for leadership I0224 21:20:40.717182 12024 detector.cpp:134] Detected a new leader: (id='0') I0224 21:20:40.717718 12030 group.cpp:629] Trying to get '/mesos/info_0000000000' in ZooKeeper I0224 21:20:40.722229 12030 detector.cpp:351] A new leading master ([email protected]:80) is detected I0224 21:20:40.722367 12030 master.cpp:734] The newly elected leader is [email protected]:80 I0224 21:20:40.722394 12030 master.cpp:742] Elected as the leading master! I thought it odd that the IP listed here is 127.0.0.1. I have not specified localhost anywhere and I explicitly specify —ip=0.0.0.0 in my mesos-master command. The slave sees the election happen, but then appears to connect to 127.0.0.1:80: I0224 21:24:18.892083 17316 detector.cpp:134] Detected a new leader: (id='0') I0224 21:24:18.892290 17316 group.cpp:629] Trying to get '/mesos/info_0000000000' in ZooKeeper I0224 21:24:18.894039 17316 detector.cpp:351] A new leading master ([email protected]:80) is detected I0224 21:24:18.894130 17316 slave.cpp:500] New master detected at [email protected]:80 I0224 21:24:18.894383 17316 slave.cpp:525] Detecting new master I0224 21:24:18.894443 17316 status_update_manager.cpp:162] New master detected at [email protected]:80 I0224 21:24:18.894630 17320 slave.cpp:1957] [email protected]:80 exited W0224 21:24:18.894665 17320 slave.cpp:1960] Master disconnected! Waiting for a new master to be elected At this point the slave never successfully connects. Just to verify, I also checked what ZooKeeper was reporting: $ /zkCli.sh get /mesos/info_0000000000 201502242120-16777343-80-12000��P"[email protected]:80 cZxid = 0x20 ctime = Tue Feb 24 21:20:40 UTC 2015 mZxid = 0x20 mtime = Tue Feb 24 21:20:40 UTC 2015 pZxid = 0x20 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x14bbd711b6e0012 dataLength = 60 numChildren = 0 So somehow the IP 127.0.0.1 is written instead of the correct IP. Any thoughts on how I can fix this? Best, Devin

