Thanks all, figured it out - the env variable for the hostname passed into 
mesos-master was being set wrong. Thanks for the input!

Devin




On February 24, 2015 at 2:21:49 PM, Ken Sipe ([email protected]) wrote:

It appears your configuration is off… as you suspected.. the master 
registration should NOT be 127.0.0.1 or 127.0.1.1.    For each master if you 
configure the IP in a file named ip under `/etc/mesos-master` you should be 
good (after restarting the master)

my configurations under /etc/mesos-master looks like this:
/etc/mesos-master/
├── cluster
├── hostname
├── ip
├── quorum
├── registry
└── work_dir

these are just plan text files.  ip has the internal IP of the master, hostname 
has the fqdn of the master, cluster is the name of the cluster, etc.

good luck!
ken

On Feb 24, 2015, at 4:06 PM, Kenneth Su <[email protected]> wrote:

Hi Devin,

I am new to Mesos as well, and I just configured it had the same problem like 
yours.

For your reference, what my fix was use the actually master IP instead, then 
slave will pick it up and connected. I really wonder if 127.0.0.1, then Slave 
will use it to connect itself and that is why never get to master one.

Hope it helps!

Kenneth

On Tue, Feb 24, 2015 at 2:50 PM, Devin Carlen <[email protected]> wrote:
Hello all,

I’m new to Mesos but have recently started trying to stand up a cluster using 
BOSH.  There is a BOSH release for it at 
https://github.com/cf-platform-eng/mesos-boshrelease that is under active 
development.

I was able to successfully deploy the cluster, however the slaves are not 
communicating with the master.  Upon investigation I found that the leader 
election is happening properly with ZooKeeper.  For this test I only have 1 
Mesos master, 3 Mesos slaves, and 1 ZooKeeper instance for this test.  All are 
running on their own VMs.  The single master gets elected upon startup:

I0224 21:20:40.716702 12024 contender.cpp:243] New candidate (id='0') has 
entered the contest for leadership
I0224 21:20:40.717182 12024 detector.cpp:134] Detected a new leader: (id='0')
I0224 21:20:40.717718 12030 group.cpp:629] Trying to get 
'/mesos/info_0000000000' in ZooKeeper
I0224 21:20:40.722229 12030 detector.cpp:351] A new leading master 
([email protected]:80) is detected
I0224 21:20:40.722367 12030 master.cpp:734] The newly elected leader is 
[email protected]:80
I0224 21:20:40.722394 12030 master.cpp:742] Elected as the leading master!

I thought it odd that the IP listed here is 127.0.0.1.  I have not specified 
localhost anywhere and I explicitly specify —ip=0.0.0.0 in my mesos-master 
command.

The slave sees the election happen, but then appears to connect to 127.0.0.1:80:

I0224 21:24:18.892083 17316 detector.cpp:134] Detected a new leader: (id='0')
I0224 21:24:18.892290 17316 group.cpp:629] Trying to get 
'/mesos/info_0000000000' in ZooKeeper
I0224 21:24:18.894039 17316 detector.cpp:351] A new leading master 
([email protected]:80) is detected
I0224 21:24:18.894130 17316 slave.cpp:500] New master detected at 
[email protected]:80
I0224 21:24:18.894383 17316 slave.cpp:525] Detecting new master
I0224 21:24:18.894443 17316 status_update_manager.cpp:162] New master detected 
at [email protected]:80
I0224 21:24:18.894630 17320 slave.cpp:1957] [email protected]:80 exited
W0224 21:24:18.894665 17320 slave.cpp:1960] Master disconnected! Waiting for a 
new master to be elected

At this point the slave never successfully connects.  Just to verify, I also 
checked what ZooKeeper was reporting:

$ /zkCli.sh get /mesos/info_0000000000

201502242120-16777343-80-12000��P"[email protected]:80
cZxid = 0x20
ctime = Tue Feb 24 21:20:40 UTC 2015
mZxid = 0x20
mtime = Tue Feb 24 21:20:40 UTC 2015
pZxid = 0x20
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x14bbd711b6e0012
dataLength = 60
numChildren = 0

So somehow the IP 127.0.0.1 is written instead of the correct IP.  Any thoughts 
on how I can fix this?

Best,

Devin


Reply via email to