Ashic,

great that you solved the issue. Could you please clarify what HA
configuration you have: how many masters and what --quorum you use?

On Sat, Jul 4, 2015 at 5:09 PM, Ashic Mahtab <[email protected]> wrote:

> Hi Nikolaos,
> I'm using an external zk, so didn't need to restart it.
>
> I might have jumped the gun slightly in the last email. It seems
> completely omitting hostname in /etc/default/mesos-master is fine. Simply
> having a file called hostname in /etc/mesos-master with the desired
> hostname as the content seems to fix it for the web UI redirects. I had the
> file on the host I was setting up scripts with, but forgot to add that step
> for the others, hence the private ip redirects.
>
> So, to summarise, I did the following:
> * Stopped mesos masters.
> * cleared /var/lib/mesos/*
> * for each node, added /etc/mesos-master/hostname with the content of the
> file being the fully qualified hostname.
> * deleted the /mesos node in zk (though I don't know if this is necessary).
> * restarted each node.
>
> Again, your guidance has helped greatly.
>
> Cheers,
> Ashic.
>
> ------------------------------
> From: [email protected]
> To: [email protected]
> Subject: Re: Lots of master elections
> Date: Sat, 4 Jul 2015 14:25:27 +0000
>
>
> Hi,
> in my case, in order to make it work, I do the following:
> 1)stop zookeeper - mesos
> 2)clean /var/lib/mesos/* /var/lib/zookeeper/version-X
> 3)restart zookeeper nodes
> 4)restart all mesos masters
>
>
>        *Nikolaos Ballas*  |  Software Development Manager
>
>  Technology Nexus S.a.r.l.
>  2-4 Rue Eugene Rupert
>  2453 Luxembourg
>  Delivery address: 2-3 Rue Eugene Rupert,Vertigo Polaris Building
>  Tel: + 3522619113580
>  [email protected] <[email protected]> | nexusgroup.com
> <http://www.nexusgroup.com/>
>  *LinkedIn.com <http://www.linkedin.com/company/nexus-technology>* | *Twitter
> <http://www.twitter.com/technologynexus>* | *Facebook.com
> <https://www.facebook.com/pages/Technology-Nexus/133756470003189>*
>
>
>
>  On 04 Jul 2015, at 15:04, Ashic Mahtab <[email protected]> wrote:
>
>  Hm...will delete everything in /var/lib/mesos (which are replicated
> logs), and retry. Guess I don't need to delete mesos things under /etc,
> then. Will report back. Checking the logs, I see that a master is elected
> but then writes this to FATAL:
>
>  F0704 12:52:38.078475  5847 master.cpp:1176] Recovery failed: Failed to
> recover registrar: Failed to perform fetch within 1mins
>
>  Then dies. Guess that's kicking off the new election.
>
>  -Ashic.
>
>  ------------------------------
> From: [email protected]
> To: [email protected]
> Subject: RE: Lots of master elections
> Date: Sat, 4 Jul 2015 12:47:53 +0000
>
> Based on your configuration under /var/  mesos creates  files. Under the
> directory mesos. Go inside var and run on command line find . - name
> *mesos*
>
>
>
>  Sent from my Samsung device
>
>
> -------- Original message --------
> From: Ashic Mahtab <[email protected]>
> Date: 04/07/2015 14:34 (GMT+01:00)
> To: Apache Mesos <[email protected]>
> Subject: RE: Lots of master elections
>
>  Thanks for the reply, Niklaos. Extrme noob question...when you say mesos
> files, which are you referring to? Would I also need to delete the /mesos
> value in Zookeeper?
>
>  ------------------------------
> From: [email protected]
> To: [email protected]
> Subject: RE: Lots of master elections
> Date: Sat, 4 Jul 2015 12:29:44 +0000
>
> You have to  clean the mesos files and restart the masters
>
>
>
>  Sent from my Samsung device
>
>
> -------- Original message --------
> From: Ashic Mahtab <[email protected]>
> Date: 04/07/2015 14:08 (GMT+01:00)
> To: [email protected]
> Subject: Lots of master elections
>
>  Hello,
> Just getting started with Mesos, and in the process of "graduating" from
> Vagrant to a cluster on Azure. Here's what I have:
>
>  * 1 Zookeeper node exposing 2181, running as expected.
> * 2 Mesos masters - mesos1.x.net, mesos2.x.net. Both exposing 5050. These
> have private and public ips. All nodes are on the same network, and have
> access to each other.
>
>  [I'll set up a third master, and add slaves soon.]
>
>  It all seems ok, and the web UI works. I can see mesos entries in
> Zookeeper. However, I've seeing a couple of things:
>
>  * A node is elected master. And about a minute later, another election
> is held. (say, mesos1.x.net)
> * If the other node wins, in the UI, I get the message that this is no
> longer the master and am redirected.
> * Sometimes the redirection is to mesos2.x.net, and all is fine (except
> another election soon).
> * Sometimes the redirection is to the internal ip of mesos2.x.net, which
> obviously gets a 404.
>
>  I should add that all the nodes are the lowest powered crappy Azure
> instances you can get.
>
>  Is this constant re-election "normal"? Should I specify hostnames or
> public ips in /etc/default/mesos-master? I tried the latter, but the
> symptoms remained. Will adding a a third master make it work? (I have
> quorum set to 2).
>
>  Any help will be greatly appreciated.
>
>  Thanks,
> Ashic.
>
>
>

Reply via email to