See also http://mesos.apache.org/documentation/latest/operational-guide/
On Wed, Apr 13, 2016 at 2:44 PM, Adam Bordelon <[email protected]> wrote: > Having 2 masters (even with a quorum of 2) is no more useful than having a > single master, since if one of your 2 masters goes down you lose quorum, > and your cluster will fail to recover, since it cannot write state changes > to both masters. > > Setting the quorum to 1 for a cluster with 2 masters would expose you to > potential split-brain problems, in case of a network partition. You would > then have 2 masters that each think they are the leader, and they would be > unable to reconcile their differences if the partition ends and they > reconnect to each other. > > It is intended that you will have an odd number of masters (similar to the > requirement to have an odd number of ZKs), and quorum = ceiling(numMasters > / 2); so you would have 1 master (quorum=1), 3 masters (quorum=2), or 5 > masters (quorum=3). > > On Wed, Apr 13, 2016 at 8:44 AM, haosdent <[email protected]> wrote: > >> It sounds like an issue in 0.28. I create a ticket >> https://issues.apache.org/jira/browse/MESOS-5207 from this to continue >> to >> investigate. If @suruchi you could attach logs of mesos masters and your >> zookeeper configuration, I think it would more helpful for investigating. >> >> On Wed, Apr 13, 2016 at 8:24 PM, Stefano Bianchi <[email protected]> >> wrote: >> >>> Thanks for your reply @haosdent. >>> I destroyed my VM and re build mesos 0.28 with just one master, and now >>> is working. >>> i will try to add another master but for the moment, since on openstack >>> i don't have much resources i need to use that VM as a slave. >>> However in the previous configuration the switch between two masters was >>> ok, just when the master was leading after, more or less 30 seconds, there >>> was that Failed to connect message. >>> >>> 2016-04-13 13:08 GMT+02:00 haosdent <[email protected]>: >>> >>>> Hi, @Stefano Could you show conf/zoo.cfg? And how many zookeper nodes >>>> you haved? And "but after a while again Failed to connec", how long >>>> the interval here? Is it always "few seconds"? >>>> >>> >>> >> >> >> -- >> Best Regards, >> Haosdent Huang >> > >

