Thanks all, figured it out - the env variable for the hostname passed into mesos-master was being set wrong. Thanks for the input!
Devin On February 24, 2015 at 2:21:49 PM, Ken Sipe ([email protected]) wrote: It appears your configuration is off… as you suspected.. the master registration should NOT be 127.0.0.1 or 127.0.1.1. For each master if you configure the IP in a file named ip under `/etc/mesos-master` you should be good (after restarting the master) my configurations under /etc/mesos-master looks like this: /etc/mesos-master/ ├── cluster ├── hostname ├── ip ├── quorum ├── registry └── work_dir these are just plan text files. ip has the internal IP of the master, hostname has the fqdn of the master, cluster is the name of the cluster, etc. good luck! ken On Feb 24, 2015, at 4:06 PM, Kenneth Su <[email protected]> wrote: Hi Devin, I am new to Mesos as well, and I just configured it had the same problem like yours. For your reference, what my fix was use the actually master IP instead, then slave will pick it up and connected. I really wonder if 127.0.0.1, then Slave will use it to connect itself and that is why never get to master one. Hope it helps! Kenneth On Tue, Feb 24, 2015 at 2:50 PM, Devin Carlen <[email protected]> wrote: Hello all, I’m new to Mesos but have recently started trying to stand up a cluster using BOSH. There is a BOSH release for it at https://github.com/cf-platform-eng/mesos-boshrelease that is under active development. I was able to successfully deploy the cluster, however the slaves are not communicating with the master. Upon investigation I found that the leader election is happening properly with ZooKeeper. For this test I only have 1 Mesos master, 3 Mesos slaves, and 1 ZooKeeper instance for this test. All are running on their own VMs. The single master gets elected upon startup: I0224 21:20:40.716702 12024 contender.cpp:243] New candidate (id='0') has entered the contest for leadership I0224 21:20:40.717182 12024 detector.cpp:134] Detected a new leader: (id='0') I0224 21:20:40.717718 12030 group.cpp:629] Trying to get '/mesos/info_0000000000' in ZooKeeper I0224 21:20:40.722229 12030 detector.cpp:351] A new leading master ([email protected]:80) is detected I0224 21:20:40.722367 12030 master.cpp:734] The newly elected leader is [email protected]:80 I0224 21:20:40.722394 12030 master.cpp:742] Elected as the leading master! I thought it odd that the IP listed here is 127.0.0.1. I have not specified localhost anywhere and I explicitly specify —ip=0.0.0.0 in my mesos-master command. The slave sees the election happen, but then appears to connect to 127.0.0.1:80: I0224 21:24:18.892083 17316 detector.cpp:134] Detected a new leader: (id='0') I0224 21:24:18.892290 17316 group.cpp:629] Trying to get '/mesos/info_0000000000' in ZooKeeper I0224 21:24:18.894039 17316 detector.cpp:351] A new leading master ([email protected]:80) is detected I0224 21:24:18.894130 17316 slave.cpp:500] New master detected at [email protected]:80 I0224 21:24:18.894383 17316 slave.cpp:525] Detecting new master I0224 21:24:18.894443 17316 status_update_manager.cpp:162] New master detected at [email protected]:80 I0224 21:24:18.894630 17320 slave.cpp:1957] [email protected]:80 exited W0224 21:24:18.894665 17320 slave.cpp:1960] Master disconnected! Waiting for a new master to be elected At this point the slave never successfully connects. Just to verify, I also checked what ZooKeeper was reporting: $ /zkCli.sh get /mesos/info_0000000000 201502242120-16777343-80-12000��P"[email protected]:80 cZxid = 0x20 ctime = Tue Feb 24 21:20:40 UTC 2015 mZxid = 0x20 mtime = Tue Feb 24 21:20:40 UTC 2015 pZxid = 0x20 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x14bbd711b6e0012 dataLength = 60 numChildren = 0 So somehow the IP 127.0.0.1 is written instead of the correct IP. Any thoughts on how I can fix this? Best, Devin

