Re: Mesos HA does not work (Failed to recover registrar)

2016-06-06 Thread Chengwei Yang
@Qian, I think you're running issues with firewall, did you make sure your master can reach from each other? FROM master A $ telnet B 5050 I think it fail to connect. Please ensure shutdown any firewall. -- Thanks, Chengwei On Mon, Jun 06, 2016 at 09:06:43PM +0800, Qian Zhang wrote: > I

Re: Mesos HA does not work (Failed to recover registrar)

2016-06-06 Thread haosdent
Hi, @Qian Zhang Your issue reminds me of this http://search-hadoop.com/m/0Vlr69BZgz1NlAPP1=Re+Mesos+Masters+Leader+Keeps+Fluctuating which I could not reproduce in my env. I am not sure whether your case are same with Stefano or not. On Mon, Jun 6, 2016 at 9:06 PM, Qian Zhang

Re: Mesos HA does not work (Failed to recover registrar)

2016-06-06 Thread Qian Zhang
I deleted everything in the work dir (/var/lib/mesos/master), and tried again, the same error still happened :-( Thanks, Qian Zhang On Mon, Jun 6, 2016 at 3:03 AM, Jean Christophe “JC” Martin < jch.mar...@gmail.com> wrote: > Qian, > > Zookeeper should be able to reach a quorum with 2, no need

Re: Mesos HA does not work (Failed to recover registrar)

2016-06-05 Thread Dick Davies
The extra zookeepers listed in the second argument will let you mesos master process keep working if its local zookeeper goes down for maintenance. On 5 June 2016 at 13:55, Qian Zhang wrote: >> You need the 2nd command line (i.e. you have to specify all the zk >> nodes on

Re: Mesos HA does not work (Failed to recover registrar)

2016-06-05 Thread Qian Zhang
> > You need the 2nd command line (i.e. you have to specify all the zk > nodes on each master, it's > not like e.g. Cassandra where you can discover other nodes from the > first one you talk to). I have an Open DC/OS environment which is enabled master HA (there are 3 master nodes) and works

Re: Mesos HA does not work (Failed to recover registrar)

2016-06-05 Thread Dick Davies
OK, good - that part looks as expected, you've had a successful election for a leader (and yes that sounds like your zookeeper layer is ok). You need the 2nd command line (i.e. you have to specify all the zk nodes on each master, it's not like e.g. Cassandra where you can discover other nodes

Re: Mesos HA does not work (Failed to recover registrar)

2016-06-04 Thread Sivaram Kannan
My 2cents - Is there a possibility of old data in /var/lib/mesos - can you try deleting the folder /var/lib/mesos in all the 3 systems and try bringing it up?? On Sat, Jun 4, 2016 at 9:04 PM, Qian Zhang wrote: > I am using the latest Mesos code in git (master branch).

Re: Mesos HA does not work (Failed to recover registrar)

2016-06-04 Thread Jie Yu
Which version are you using? - Jie On Sat, Jun 4, 2016 at 4:34 PM, Qian Zhang wrote: > Thanks Vinod and Dick. > > I think my 3 ZK servers have formed a quorum, each of them has the > following config: > $ cat conf/zoo.cfg > server.1=192.168.122.132:2888:3888 >

Re: Mesos HA does not work (Failed to recover registrar)

2016-06-04 Thread Qian Zhang
Thanks Vinod and Dick. I think my 3 ZK servers have formed a quorum, each of them has the following config: $ cat conf/zoo.cfg server.1=192.168.122.132:2888:3888 server.2=192.168.122.225:2888:3888 server.3=192.168.122.171:2888:3888 autopurge.purgeInterval=6

Re: Mesos HA does not work (Failed to recover registrar)

2016-06-04 Thread Dick Davies
You told the master it needed a quorum of 2 and it's the only one online, so it's bombing out. That's the expected behaviour. You need to start at least 2 zookeepers before it will be a functional group, same for the masters. You haven't mentioned how you setup your zookeeper cluster, so i'm

Re: Mesos HA does not work (Failed to recover registrar)

2016-06-04 Thread Vinod Kone
You need to start all 3 masters simultaneously so that they can reach a quorum. Also, looks like each master is talking to its local zk server, are you sure the 3 ZK servers are forming a quorum? On Sat, Jun 4, 2016 at 9:42 AM, Qian Zhang wrote: > Hi Folks, > > I am trying