Ok...so added hostname=mesosn.x.net to each node's /etc/default/mesos-master, cleared everything in var/lib/mesos, cleared /mesos in zk and started the mesos services. The elected master is holding steady, but going to a non-leader master in the browser still redirects to the internal ip address of the master, instead of mesosn.x.net. Is there a way to force it to use the mesosn.x.net address instead of the internal ip?
From: [email protected] To: [email protected] Subject: RE: Lots of master elections Date: Sat, 4 Jul 2015 14:28:04 +0100 Deleting /var/lib/mesos/* (which is essentially replicated logs), and deleting the /mesos node in ZK, then restarting everything seems to have improved things. Single election, and no re-elections or FATAL logs. However, going to a non-leader master redirects to internal ip. Will see what happens if I add hostnames back in. From: [email protected] To: [email protected] Subject: RE: Lots of master elections Date: Sat, 4 Jul 2015 14:04:34 +0100 Hm...will delete everything in /var/lib/mesos (which are replicated logs), and retry. Guess I don't need to delete mesos things under /etc, then. Will report back. Checking the logs, I see that a master is elected but then writes this to FATAL: F0704 12:52:38.078475 5847 master.cpp:1176] Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins Then dies. Guess that's kicking off the new election. -Ashic. From: [email protected] To: [email protected] Subject: RE: Lots of master elections Date: Sat, 4 Jul 2015 12:47:53 +0000 Based on your configuration under /var/ mesos creates files. Under the directory mesos. Go inside var and run on command line find . - name *mesos* Sent from my Samsung device -------- Original message -------- From: Ashic Mahtab <[email protected]> Date: 04/07/2015 14:34 (GMT+01:00) To: Apache Mesos <[email protected]> Subject: RE: Lots of master elections Thanks for the reply, Niklaos. Extrme noob question...when you say mesos files, which are you referring to? Would I also need to delete the /mesos value in Zookeeper? From: [email protected] To: [email protected] Subject: RE: Lots of master elections Date: Sat, 4 Jul 2015 12:29:44 +0000 You have to clean the mesos files and restart the masters Sent from my Samsung device -------- Original message -------- From: Ashic Mahtab <[email protected]> Date: 04/07/2015 14:08 (GMT+01:00) To: [email protected] Subject: Lots of master elections Hello, Just getting started with Mesos, and in the process of "graduating" from Vagrant to a cluster on Azure. Here's what I have: * 1 Zookeeper node exposing 2181, running as expected. * 2 Mesos masters - mesos1.x.net, mesos2.x.net. Both exposing 5050. These have private and public ips. All nodes are on the same network, and have access to each other. [I'll set up a third master, and add slaves soon.] It all seems ok, and the web UI works. I can see mesos entries in Zookeeper. However, I've seeing a couple of things: * A node is elected master. And about a minute later, another election is held. (say, mesos1.x.net) * If the other node wins, in the UI, I get the message that this is no longer the master and am redirected. * Sometimes the redirection is to mesos2.x.net, and all is fine (except another election soon). * Sometimes the redirection is to the internal ip of mesos2.x.net, which obviously gets a 404. I should add that all the nodes are the lowest powered crappy Azure instances you can get. Is this constant re-election "normal"? Should I specify hostnames or public ips in /etc/default/mesos-master? I tried the latter, but the symptoms remained. Will adding a a third master make it work? (I have quorum set to 2). Any help will be greatly appreciated. Thanks, Ashic.

