Got it. I was confused by your first email where you said you have 2 masters.
On Tue, Jul 7, 2015 at 4:40 AM, Ashic Mahtab <[email protected]> wrote: > Sure, Alex. > > 3 masters. Quorum is 2. > > ------------------------------ > Date: Mon, 6 Jul 2015 19:44:28 +0200 > Subject: Re: Lots of master elections > From: [email protected] > To: [email protected] > > > Ashic, > > great that you solved the issue. Could you please clarify what HA > configuration you have: how many masters and what --quorum you use? > > On Sat, Jul 4, 2015 at 5:09 PM, Ashic Mahtab <[email protected]> wrote: > > Hi Nikolaos, > I'm using an external zk, so didn't need to restart it. > > I might have jumped the gun slightly in the last email. It seems > completely omitting hostname in /etc/default/mesos-master is fine. Simply > having a file called hostname in /etc/mesos-master with the desired > hostname as the content seems to fix it for the web UI redirects. I had the > file on the host I was setting up scripts with, but forgot to add that step > for the others, hence the private ip redirects. > > So, to summarise, I did the following: > * Stopped mesos masters. > * cleared /var/lib/mesos/* > * for each node, added /etc/mesos-master/hostname with the content of the > file being the fully qualified hostname. > * deleted the /mesos node in zk (though I don't know if this is necessary). > * restarted each node. > > Again, your guidance has helped greatly. > > Cheers, > Ashic. > > ------------------------------ > From: [email protected] > To: [email protected] > Subject: Re: Lots of master elections > Date: Sat, 4 Jul 2015 14:25:27 +0000 > > > Hi, > in my case, in order to make it work, I do the following: > 1)stop zookeeper - mesos > 2)clean /var/lib/mesos/* /var/lib/zookeeper/version-X > 3)restart zookeeper nodes > 4)restart all mesos masters > > > *Nikolaos Ballas* | Software Development Manager > > Technology Nexus S.a.r.l. > 2-4 Rue Eugene Rupert > 2453 Luxembourg > Delivery address: 2-3 Rue Eugene Rupert,Vertigo Polaris Building > Tel: + 3522619113580 > [email protected] <[email protected]> | nexusgroup.com > <http://www.nexusgroup.com/> > *LinkedIn.com <http://www.linkedin.com/company/nexus-technology>* | *Twitter > <http://www.twitter.com/technologynexus>* | *Facebook.com > <https://www.facebook.com/pages/Technology-Nexus/133756470003189>* > > > > On 04 Jul 2015, at 15:04, Ashic Mahtab <[email protected]> wrote: > > Hm...will delete everything in /var/lib/mesos (which are replicated > logs), and retry. Guess I don't need to delete mesos things under /etc, > then. Will report back. Checking the logs, I see that a master is elected > but then writes this to FATAL: > > F0704 12:52:38.078475 5847 master.cpp:1176] Recovery failed: Failed to > recover registrar: Failed to perform fetch within 1mins > > Then dies. Guess that's kicking off the new election. > > -Ashic. > > ------------------------------ > From: [email protected] > To: [email protected] > Subject: RE: Lots of master elections > Date: Sat, 4 Jul 2015 12:47:53 +0000 > > Based on your configuration under /var/ mesos creates files. Under the > directory mesos. Go inside var and run on command line find . - name > *mesos* > > > > Sent from my Samsung device > > > -------- Original message -------- > From: Ashic Mahtab <[email protected]> > Date: 04/07/2015 14:34 (GMT+01:00) > To: Apache Mesos <[email protected]> > Subject: RE: Lots of master elections > > Thanks for the reply, Niklaos. Extrme noob question...when you say mesos > files, which are you referring to? Would I also need to delete the /mesos > value in Zookeeper? > > ------------------------------ > From: [email protected] > To: [email protected] > Subject: RE: Lots of master elections > Date: Sat, 4 Jul 2015 12:29:44 +0000 > > You have to clean the mesos files and restart the masters > > > > Sent from my Samsung device > > > -------- Original message -------- > From: Ashic Mahtab <[email protected]> > Date: 04/07/2015 14:08 (GMT+01:00) > To: [email protected] > Subject: Lots of master elections > > Hello, > Just getting started with Mesos, and in the process of "graduating" from > Vagrant to a cluster on Azure. Here's what I have: > > * 1 Zookeeper node exposing 2181, running as expected. > * 2 Mesos masters - mesos1.x.net, mesos2.x.net. Both exposing 5050. These > have private and public ips. All nodes are on the same network, and have > access to each other. > > [I'll set up a third master, and add slaves soon.] > > It all seems ok, and the web UI works. I can see mesos entries in > Zookeeper. However, I've seeing a couple of things: > > * A node is elected master. And about a minute later, another election > is held. (say, mesos1.x.net) > * If the other node wins, in the UI, I get the message that this is no > longer the master and am redirected. > * Sometimes the redirection is to mesos2.x.net, and all is fine (except > another election soon). > * Sometimes the redirection is to the internal ip of mesos2.x.net, which > obviously gets a 404. > > I should add that all the nodes are the lowest powered crappy Azure > instances you can get. > > Is this constant re-election "normal"? Should I specify hostnames or > public ips in /etc/default/mesos-master? I tried the latter, but the > symptoms remained. Will adding a a third master make it work? (I have > quorum set to 2). > > Any help will be greatly appreciated. > > Thanks, > Ashic. > > > >

