Golden Rule : Don't use even numbers of members with quorum systems.

You need a quorum to function so with 2 masters and quorum=2, you can't
ever take a member down. With 2 masters and quorum=1, you're asking
for "split brain".

(this is exactly the same with zookeeper by the way, it's also a quorum system)

If you have 1 master, quorum=1
if you have 3 masters, quorum=2
if you have 5 masters, quorum=3

and so on. Try that and see if it helps.


On 7 November 2014 09:42, sujinzhao <[email protected]> wrote:
> In fact, I also tried with launching 2 masters on two separate machines, at
> first, one of them was successfully elected as a leader, and both of them
> printed several lines of messages:
>
> Replica in EMPTY status received a broadcasted recover request
> Received a recover response from a replica in EMPTY status
>
> then the leader master aborted after outputing errors:
>
> Recovery failed: Failed to recover registrar: Failed to perform fetch within
> 1mins
> *** Check failure stack trace: ***
> @ 0x7f3c1ea105cd google::LogMessage::Fail()
> ..............................
>
> and next, the second master became the new leader, it also tried to recovery
> from the registrar, but also failed and printed errors before aborted:
>
> Recovery failed: Failed to recover registrar: Failed to perform fetch within
> 1mins
> *** Check failure stack trace: ***
> @ 0x7f3c1ea105cd google::LogMessage::Fail()
> ...............................
>
> So I guess that's not problems of zookeeper, it's the elected leader can not
> recover from registrar, could somebody be kind to illustrate some principles
> of mesos registry, or give me some suggestions?
>
> THANKS.
>
> "david.j.palaitis" <[email protected]>编写:
>
>
> With a single master,  you should not set quorum=2
>
>
> -------- Original message --------
> From: sujinzhao <[email protected]>
> Date:11/06/2014 4:01 PM (GMT-05:00)
> To: [email protected]
> Cc:
> Subject: Problems of running mesos-0.20.0 with zookeeper
>
> Hi,all,
>
> I set up zookeeper service with three machines zoo1, zoo2, zoo3, and also
> installed 1 mesos master and 2 slaves on another three nodes, I tried to run
> master and slaves with:
> ./mesos-master.sh --ip=master-ip
> --zk=zk://zoo1:2181,zoo2:2181,zoo3:2181/mesos --quorum=2
>
> ./mesos-slave.sh --ip=slave-ip
> --master=zk://zoo1:2181,zoo2:2181,zoo3:2181/mesos
>
> I also created the /mesos znode before running the above commands, but I got
> the following error:
>
> Recovering from registrar
> Recovering registrar
> Recovery failed: Failed to recover registrar: Failed to perform fetch within
> 1mins
> *** Check failure stack trace: ***
>     @  0x7f3c1ea105cd google::LogMessage::Fail()
> ...............................
>
> after reading the master log, I found that before causing error, master has
> already been elected successfully, but the leader failed in recovering from
> registrar, so I guess this error has little relationship with zookeeper.
>
> after googleing I found that other people also encountered this problem, but
> with no solution, I also exclude the possible reason of ssh between
> master/slave and zookeeper servers with no password.
>
> So, could somebody be kindly to tell me how to solve this error? any
> suggestions will be appreciated.
>
> THANKS.

Reply via email to