Ok please follow me in this strange story. At the beginning i have set 3 mesos master on the same cluster using mesos 0.27 now i deleted one of these 3 mesos 0.27 masters and build a mesos 0.28 to joint to the other 2. I get the problem i described after 10 seconds i get failed to connect, but the election of new one works fine because the new leader is 0.27 which is stable. How to send to you logs?
2016-04-13 23:53 GMT+02:00 Adam Bordelon <[email protected]>: > See also http://mesos.apache.org/documentation/latest/operational-guide/ > > On Wed, Apr 13, 2016 at 2:44 PM, Adam Bordelon <[email protected]> wrote: > >> Having 2 masters (even with a quorum of 2) is no more useful than having >> a single master, since if one of your 2 masters goes down you lose quorum, >> and your cluster will fail to recover, since it cannot write state changes >> to both masters. >> >> Setting the quorum to 1 for a cluster with 2 masters would expose you to >> potential split-brain problems, in case of a network partition. You would >> then have 2 masters that each think they are the leader, and they would be >> unable to reconcile their differences if the partition ends and they >> reconnect to each other. >> >> It is intended that you will have an odd number of masters (similar to >> the requirement to have an odd number of ZKs), and quorum = >> ceiling(numMasters / 2); so you would have 1 master (quorum=1), 3 masters >> (quorum=2), or 5 masters (quorum=3). >> >> On Wed, Apr 13, 2016 at 8:44 AM, haosdent <[email protected]> wrote: >> >>> It sounds like an issue in 0.28. I create a ticket >>> https://issues.apache.org/jira/browse/MESOS-5207 from this to continue >>> to >>> investigate. If @suruchi you could attach logs of mesos masters and >>> your zookeeper configuration, I think it would more helpful for >>> investigating. >>> >>> On Wed, Apr 13, 2016 at 8:24 PM, Stefano Bianchi <[email protected]> >>> wrote: >>> >>>> Thanks for your reply @haosdent. >>>> I destroyed my VM and re build mesos 0.28 with just one master, and now >>>> is working. >>>> i will try to add another master but for the moment, since on openstack >>>> i don't have much resources i need to use that VM as a slave. >>>> However in the previous configuration the switch between two masters >>>> was ok, just when the master was leading after, more or less 30 seconds, >>>> there was that Failed to connect message. >>>> >>>> 2016-04-13 13:08 GMT+02:00 haosdent <[email protected]>: >>>> >>>>> Hi, @Stefano Could you show conf/zoo.cfg? And how many zookeper nodes >>>>> you haved? And "but after a while again Failed to connec", how long >>>>> the interval here? Is it always "few seconds"? >>>>> >>>> >>>> >>> >>> >>> -- >>> Best Regards, >>> Haosdent Huang >>> >> >> >

