Re: Issue with Multinode Cluster

Ryan Thomas Mon, 25 Aug 2014 17:45:04 -0700

I'm not sure what the best-practice is, but I use the /etc/mesos* method as
I find it more explicit.



On 26 August 2014 10:38, Frank Hinek <[email protected]> wrote:

> Vinod: bingo!  I’ve spent 2 days trying to figure this out.  The only
> interfaces on the VMs were eth0 and lo—interesting that it picked the
> loopback automatically or that the tutorials didn’t note this.
>
> Ryan: Is it considered better practice to modify /etc/default/mesos-master
> or write the IP to /etc/mesos-master/ip ?
>
>
> On August 25, 2014 at 8:31:42 PM, Ryan Thomas ([email protected])
> wrote:
>
> If you're using the mesos-init-wrapper you can write the IP to
> /etc/mesos-master/ip and that flag will be set. This goes for all the
> flags, and can be done for the slave as well in /etc/mesos-slave.
>
>
> On 26 August 2014 10:18, Vinod Kone <[email protected]> wrote:
>
>> From the logs, it looks like master is binding to its loopback address
>> (127.0.0.1) and publishing that to ZK. So the slave is trying to reach the
>> master on its loopback interface, which is failing.
>>
>> Start the master with "--ip" flag set to its visible ip (10.1.100.116).
>> Mesosphere probably has a file (/etc/defaults/mesos-master?) to set these
>> flags.
>>
>>
>> On Mon, Aug 25, 2014 at 3:26 PM, Frank Hinek <[email protected]>
>> wrote:
>>
>>>  Logs attached from master, slave, and zookeeper after a reboot of both
>>> nodes.
>>>
>>>
>>>
>>>
>>> On August 25, 2014 at 1:14:07 PM, Vinod Kone ([email protected])
>>> wrote:
>>>
>>>  what do the master and slave logs say?
>>>
>>>
>>> On Mon, Aug 25, 2014 at 9:03 AM, Frank Hinek <[email protected]>
>>> wrote:
>>>
>>>>  I was able to get a single node environment setup on Ubuntu 14.04.1
>>>> following this guide: http://mesosphere.io/learn/install_ubuntu_debian/
>>>>
>>>>  The single slave registered with the master via the local Zookeeper
>>>> and I could run basic commands by posting to Marathon.
>>>>
>>>>  I then tried to build a multi node cluster following this guide:
>>>> http://mesosphere.io/docs/mesosphere/getting-started/cloud-install/
>>>>
>>>>  The guide walks you through using the Mesosphere packages to install
>>>> Mesos, Marathon, and Zookeeper one one node that will be the master and on
>>>> the slave just Mesos.  You then disable automatic start of: mesos-slave on
>>>> the master, mesos-master on the slave, and zookeeper on the slave.  It ends
>>>> up looking like:
>>>>
>>>>  NODE 1 (MASTER):
>>>>  - IP Address: 10.1.100.116
>>>>  - mesos-master
>>>>  - marathon
>>>>  - zookeeper
>>>>
>>>>  NODE 2 (SLAVE):
>>>>  - IP Address: 10.1.100.117
>>>>  - mesos-slave
>>>>
>>>>  The issue I’m running into is that the slave rarely is able to
>>>> register with the master using the Zookeeper.  I can never run any jobs
>>>> from marathon (just trying a simple sleep 5 command).  Even when the slave
>>>> does register the Mesos UI shows 1 “Deactivated” slave — it never goes
>>>> active.
>>>>
>>>>  Here are the values I have for /etc/mesos/zk:
>>>>
>>>>  MASTER: zk://10.1.100.116:2181/mesos
>>>>  SLAVE: zk://10.1.100.116:2181/mesos
>>>>
>>>>  Any ideas of what to troubleshoot?  Would greatly appreciate pointers.
>>>>
>>>>  Environment details:
>>>>  - Ubuntu Server 14.04.1 running as VMs on ESXi 5.5U1
>>>>  - Mesos: 0.20.0
>>>>  - Marathon 0.6.1
>>>>
>>>>  There are no apparent connectivity issues, and I’m not having any
>>>> problems with other VMs on the ESXi host.  All VM to VM communication is on
>>>> the same VLAN and within the same host.
>>>>
>>>>  Zookeeper log on master (slave briefly registered so I tried to run a
>>>> sleep 5 command from marathon and then the slave disconnected):
>>>>
>>>>  2014-08-25 11:50:34,976 - INFO  [NIOServerCxn.Factory:
>>>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket
>>>> connection from /10.1.100.117:45778
>>>> 2014-08-25 11:50:34,977 - WARN  [NIOServerCxn.Factory:
>>>> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@793] - Connection request from
>>>> old client /10.1.100.117:45778; will be dropped if server is in r-o
>>>> mode
>>>> 2014-08-25 11:50:34,977 - INFO  [NIOServerCxn.Factory:
>>>> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@839] - Client attempting to
>>>> establish new session at /10.1.100.117:45778
>>>> 2014-08-25 11:50:34,978 - INFO  [SyncThread:0:ZooKeeperServer@595] -
>>>> Established session 0x1480b22f7f0000c with negotiated timeout 10000 for
>>>> client /10.1.100.117:45778
>>>> 2014-08-25 11:51:05,724 - INFO  [ProcessThread(sid:0
>>>> cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException
>>>> when processing sessionid:0x1480b22f7f00001 type:create cxid:0x53faafa9
>>>> zxid:0x49 txntype:-1 reqpath:n/a Error Path:/marathon Error:KeeperErrorCode
>>>> = NodeExists for /marathon
>>>> 2014-08-25 11:51:05,724 - INFO  [ProcessThread(sid:0
>>>> cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException
>>>> when processing sessionid:0x1480b22f7f00001 type:create cxid:0x53faafaa
>>>> zxid:0x4a txntype:-1 reqpath:n/a Error Path:/marathon/state
>>>> Error:KeeperErrorCode = NodeExists for /marathon/state
>>>> 2014-08-25 11:51:09,145 - INFO  [ProcessThread(sid:0
>>>> cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException
>>>> when processing sessionid:0x1480b22f7f00001 type:create cxid:0x53faafb5
>>>> zxid:0x4d txntype:-1 reqpath:n/a Error Path:/marathon Error:KeeperErrorCode
>>>> = NodeExists for /marathon
>>>> 2014-08-25 11:51:09,146 - INFO  [ProcessThread(sid:0
>>>> cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException
>>>> when processing sessionid:0x1480b22f7f00001 type:create cxid:0x53faafb6
>>>> zxid:0x4e txntype:-1 reqpath:n/a Error Path:/marathon/state
>>>> Error:KeeperErrorCode = NodeExists for /marathon/state
>>>>
>>>>
>>>
>>
>

Re: Issue with Multinode Cluster

Reply via email to