what do the master and slave logs say?
On Mon, Aug 25, 2014 at 9:03 AM, Frank Hinek <[email protected]> wrote: > I was able to get a single node environment setup on Ubuntu 14.04.1 > following this guide: http://mesosphere.io/learn/install_ubuntu_debian/ > > The single slave registered with the master via the local Zookeeper and I > could run basic commands by posting to Marathon. > > I then tried to build a multi node cluster following this guide: > http://mesosphere.io/docs/mesosphere/getting-started/cloud-install/ > > The guide walks you through using the Mesosphere packages to install > Mesos, Marathon, and Zookeeper one one node that will be the master and on > the slave just Mesos. You then disable automatic start of: mesos-slave on > the master, mesos-master on the slave, and zookeeper on the slave. It ends > up looking like: > > NODE 1 (MASTER): > - IP Address: 10.1.100.116 > - mesos-master > - marathon > - zookeeper > > NODE 2 (SLAVE): > - IP Address: 10.1.100.117 > - mesos-slave > > The issue I’m running into is that the slave rarely is able to register > with the master using the Zookeeper. I can never run any jobs from > marathon (just trying a simple sleep 5 command). Even when the slave does > register the Mesos UI shows 1 “Deactivated” slave — it never goes active. > > Here are the values I have for /etc/mesos/zk: > > MASTER: zk://10.1.100.116:2181/mesos > SLAVE: zk://10.1.100.116:2181/mesos > > Any ideas of what to troubleshoot? Would greatly appreciate pointers. > > Environment details: > - Ubuntu Server 14.04.1 running as VMs on ESXi 5.5U1 > - Mesos: 0.20.0 > - Marathon 0.6.1 > > There are no apparent connectivity issues, and I’m not having any problems > with other VMs on the ESXi host. All VM to VM communication is on the same > VLAN and within the same host. > > Zookeeper log on master (slave briefly registered so I tried to run a > sleep 5 command from marathon and then the slave disconnected): > > 2014-08-25 11:50:34,976 - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket > connection from /10.1.100.117:45778 > 2014-08-25 11:50:34,977 - WARN [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@793] - Connection request from old > client /10.1.100.117:45778; will be dropped if server is in r-o mode > 2014-08-25 11:50:34,977 - INFO [NIOServerCxn.Factory: > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@839] - Client attempting to > establish new session at /10.1.100.117:45778 > 2014-08-25 11:50:34,978 - INFO [SyncThread:0:ZooKeeperServer@595] - > Established session 0x1480b22f7f0000c with negotiated timeout 10000 for > client /10.1.100.117:45778 > 2014-08-25 11:51:05,724 - INFO [ProcessThread(sid:0 > cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException > when processing sessionid:0x1480b22f7f00001 type:create cxid:0x53faafa9 > zxid:0x49 txntype:-1 reqpath:n/a Error Path:/marathon Error:KeeperErrorCode > = NodeExists for /marathon > 2014-08-25 11:51:05,724 - INFO [ProcessThread(sid:0 > cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException > when processing sessionid:0x1480b22f7f00001 type:create cxid:0x53faafaa > zxid:0x4a txntype:-1 reqpath:n/a Error Path:/marathon/state > Error:KeeperErrorCode = NodeExists for /marathon/state > 2014-08-25 11:51:09,145 - INFO [ProcessThread(sid:0 > cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException > when processing sessionid:0x1480b22f7f00001 type:create cxid:0x53faafb5 > zxid:0x4d txntype:-1 reqpath:n/a Error Path:/marathon Error:KeeperErrorCode > = NodeExists for /marathon > 2014-08-25 11:51:09,146 - INFO [ProcessThread(sid:0 > cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException > when processing sessionid:0x1480b22f7f00001 type:create cxid:0x53faafb6 > zxid:0x4e txntype:-1 reqpath:n/a Error Path:/marathon/state > Error:KeeperErrorCode = NodeExists for /marathon/state > >

