Ah..that was while I was setting things up. I guess if one master goes down, I'm temporarily in that scenario, hence I'd expect it to work - which it does. I think the issue was that the apt-get install started the service, and I changed settings, and it needed a clearing up of the cached information.
Date: Tue, 7 Jul 2015 10:33:20 +0200 Subject: Re: Lots of master elections From: [email protected] To: [email protected] Got it. I was confused by your first email where you said you have 2 masters. On Tue, Jul 7, 2015 at 4:40 AM, Ashic Mahtab <[email protected]> wrote: Sure, Alex. 3 masters. Quorum is 2. Date: Mon, 6 Jul 2015 19:44:28 +0200 Subject: Re: Lots of master elections From: [email protected] To: [email protected] Ashic, great that you solved the issue. Could you please clarify what HA configuration you have: how many masters and what --quorum you use? On Sat, Jul 4, 2015 at 5:09 PM, Ashic Mahtab <[email protected]> wrote: Hi Nikolaos,I'm using an external zk, so didn't need to restart it. I might have jumped the gun slightly in the last email. It seems completely omitting hostname in /etc/default/mesos-master is fine. Simply having a file called hostname in /etc/mesos-master with the desired hostname as the content seems to fix it for the web UI redirects. I had the file on the host I was setting up scripts with, but forgot to add that step for the others, hence the private ip redirects. So, to summarise, I did the following:* Stopped mesos masters.* cleared /var/lib/mesos/** for each node, added /etc/mesos-master/hostname with the content of the file being the fully qualified hostname.* deleted the /mesos node in zk (though I don't know if this is necessary).* restarted each node. Again, your guidance has helped greatly. Cheers,Ashic. From: [email protected] To: [email protected] Subject: Re: Lots of master elections Date: Sat, 4 Jul 2015 14:25:27 +0000 Hi, in my case, in order to make it work, I do the following: 1)stop zookeeper - mesos 2)clean /var/lib/mesos/* /var/lib/zookeeper/version-X 3)restart zookeeper nodes 4)restart all mesos masters Nikolaos Ballas | Software Development Manager Technology Nexus S.a.r.l. 2-4 Rue Eugene Rupert 2453 Luxembourg Delivery address: 2-3 Rue Eugene Rupert,Vertigo Polaris Building Tel: + 3522619113580 [email protected] | nexusgroup.com LinkedIn.com | Twitter | Facebook.com On 04 Jul 2015, at 15:04, Ashic Mahtab <[email protected]> wrote: Hm...will delete everything in /var/lib/mesos (which are replicated logs), and retry. Guess I don't need to delete mesos things under /etc, then. Will report back. Checking the logs, I see that a master is elected but then writes this to FATAL: F0704 12:52:38.078475 5847 master.cpp:1176] Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins Then dies. Guess that's kicking off the new election. -Ashic. From: [email protected] To: [email protected] Subject: RE: Lots of master elections Date: Sat, 4 Jul 2015 12:47:53 +0000 Based on your configuration under /var/ mesos creates files. Under the directory mesos. Go inside var and run on command line find . - name *mesos* Sent from my Samsung device -------- Original message -------- From: Ashic Mahtab <[email protected]> Date: 04/07/2015 14:34 (GMT+01:00) To: Apache Mesos <[email protected]> Subject: RE: Lots of master elections Thanks for the reply, Niklaos. Extrme noob question...when you say mesos files, which are you referring to? Would I also need to delete the /mesos value in Zookeeper? From: [email protected] To: [email protected] Subject: RE: Lots of master elections Date: Sat, 4 Jul 2015 12:29:44 +0000 You have to clean the mesos files and restart the masters Sent from my Samsung device -------- Original message -------- From: Ashic Mahtab <[email protected]> Date: 04/07/2015 14:08 (GMT+01:00) To: [email protected] Subject: Lots of master elections Hello, Just getting started with Mesos, and in the process of "graduating" from Vagrant to a cluster on Azure. Here's what I have: * 1 Zookeeper node exposing 2181, running as expected. * 2 Mesos masters - mesos1.x.net, mesos2.x.net. Both exposing 5050. These have private and public ips. All nodes are on the same network, and have access to each other. [I'll set up a third master, and add slaves soon.] It all seems ok, and the web UI works. I can see mesos entries in Zookeeper. However, I've seeing a couple of things: * A node is elected master. And about a minute later, another election is held. (say, mesos1.x.net) * If the other node wins, in the UI, I get the message that this is no longer the master and am redirected. * Sometimes the redirection is to mesos2.x.net, and all is fine (except another election soon). * Sometimes the redirection is to the internal ip of mesos2.x.net, which obviously gets a 404. I should add that all the nodes are the lowest powered crappy Azure instances you can get. Is this constant re-election "normal"? Should I specify hostnames or public ips in /etc/default/mesos-master? I tried the latter, but the symptoms remained. Will adding a a third master make it work? (I have quorum set to 2). Any help will be greatly appreciated. Thanks, Ashic.

