>From the logs, it looks like master is binding to its loopback address (127.0.0.1) and publishing that to ZK. So the slave is trying to reach the master on its loopback interface, which is failing.
Start the master with "--ip" flag set to its visible ip (10.1.100.116). Mesosphere probably has a file (/etc/defaults/mesos-master?) to set these flags. On Mon, Aug 25, 2014 at 3:26 PM, Frank Hinek <[email protected]> wrote: > Logs attached from master, slave, and zookeeper after a reboot of both > nodes. > > > > > On August 25, 2014 at 1:14:07 PM, Vinod Kone ([email protected]) wrote: > > what do the master and slave logs say? > > > On Mon, Aug 25, 2014 at 9:03 AM, Frank Hinek <[email protected]> > wrote: > >> I was able to get a single node environment setup on Ubuntu 14.04.1 >> following this guide: http://mesosphere.io/learn/install_ubuntu_debian/ >> >> The single slave registered with the master via the local Zookeeper and >> I could run basic commands by posting to Marathon. >> >> I then tried to build a multi node cluster following this guide: >> http://mesosphere.io/docs/mesosphere/getting-started/cloud-install/ >> >> The guide walks you through using the Mesosphere packages to install >> Mesos, Marathon, and Zookeeper one one node that will be the master and on >> the slave just Mesos. You then disable automatic start of: mesos-slave on >> the master, mesos-master on the slave, and zookeeper on the slave. It ends >> up looking like: >> >> NODE 1 (MASTER): >> - IP Address: 10.1.100.116 >> - mesos-master >> - marathon >> - zookeeper >> >> NODE 2 (SLAVE): >> - IP Address: 10.1.100.117 >> - mesos-slave >> >> The issue I’m running into is that the slave rarely is able to register >> with the master using the Zookeeper. I can never run any jobs from >> marathon (just trying a simple sleep 5 command). Even when the slave does >> register the Mesos UI shows 1 “Deactivated” slave — it never goes active. >> >> Here are the values I have for /etc/mesos/zk: >> >> MASTER: zk://10.1.100.116:2181/mesos >> SLAVE: zk://10.1.100.116:2181/mesos >> >> Any ideas of what to troubleshoot? Would greatly appreciate pointers. >> >> Environment details: >> - Ubuntu Server 14.04.1 running as VMs on ESXi 5.5U1 >> - Mesos: 0.20.0 >> - Marathon 0.6.1 >> >> There are no apparent connectivity issues, and I’m not having any >> problems with other VMs on the ESXi host. All VM to VM communication is on >> the same VLAN and within the same host. >> >> Zookeeper log on master (slave briefly registered so I tried to run a >> sleep 5 command from marathon and then the slave disconnected): >> >> 2014-08-25 11:50:34,976 - INFO [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket >> connection from /10.1.100.117:45778 >> 2014-08-25 11:50:34,977 - WARN [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@793] - Connection request from old >> client /10.1.100.117:45778; will be dropped if server is in r-o mode >> 2014-08-25 11:50:34,977 - INFO [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@839] - Client attempting to >> establish new session at /10.1.100.117:45778 >> 2014-08-25 11:50:34,978 - INFO [SyncThread:0:ZooKeeperServer@595] - >> Established session 0x1480b22f7f0000c with negotiated timeout 10000 for >> client /10.1.100.117:45778 >> 2014-08-25 11:51:05,724 - INFO [ProcessThread(sid:0 >> cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException >> when processing sessionid:0x1480b22f7f00001 type:create cxid:0x53faafa9 >> zxid:0x49 txntype:-1 reqpath:n/a Error Path:/marathon Error:KeeperErrorCode >> = NodeExists for /marathon >> 2014-08-25 11:51:05,724 - INFO [ProcessThread(sid:0 >> cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException >> when processing sessionid:0x1480b22f7f00001 type:create cxid:0x53faafaa >> zxid:0x4a txntype:-1 reqpath:n/a Error Path:/marathon/state >> Error:KeeperErrorCode = NodeExists for /marathon/state >> 2014-08-25 11:51:09,145 - INFO [ProcessThread(sid:0 >> cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException >> when processing sessionid:0x1480b22f7f00001 type:create cxid:0x53faafb5 >> zxid:0x4d txntype:-1 reqpath:n/a Error Path:/marathon Error:KeeperErrorCode >> = NodeExists for /marathon >> 2014-08-25 11:51:09,146 - INFO [ProcessThread(sid:0 >> cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException >> when processing sessionid:0x1480b22f7f00001 type:create cxid:0x53faafb6 >> zxid:0x4e txntype:-1 reqpath:n/a Error Path:/marathon/state >> Error:KeeperErrorCode = NodeExists for /marathon/state >> >> >

