You could try attaching them to this thread, but ideally we'd want them attached to the JIRA that haosdent created: https://issues.apache.org/jira/browse/MESOS-5207 You may need to create a JIRA account to upload a file. If it still won't let you, ask dev@ to make you a JIRA contributor.
On Thu, Apr 14, 2016 at 1:50 AM, Stefano Bianchi <[email protected]> wrote: > Ok please follow me in this strange story. > At the beginning i have set 3 mesos master on the same cluster using mesos > 0.27 > now i deleted one of these 3 mesos 0.27 masters and build a mesos 0.28 to > joint to the other 2. > I get the problem i described after 10 seconds i get failed to connect, > but the election of new one works fine because the new leader is 0.27 which > is stable. > How to send to you logs? > > 2016-04-13 23:53 GMT+02:00 Adam Bordelon <[email protected]>: > >> See also http://mesos.apache.org/documentation/latest/operational-guide/ >> >> On Wed, Apr 13, 2016 at 2:44 PM, Adam Bordelon <[email protected]> >> wrote: >> >>> Having 2 masters (even with a quorum of 2) is no more useful than having >>> a single master, since if one of your 2 masters goes down you lose quorum, >>> and your cluster will fail to recover, since it cannot write state changes >>> to both masters. >>> >>> Setting the quorum to 1 for a cluster with 2 masters would expose you to >>> potential split-brain problems, in case of a network partition. You would >>> then have 2 masters that each think they are the leader, and they would be >>> unable to reconcile their differences if the partition ends and they >>> reconnect to each other. >>> >>> It is intended that you will have an odd number of masters (similar to >>> the requirement to have an odd number of ZKs), and quorum = >>> ceiling(numMasters / 2); so you would have 1 master (quorum=1), 3 masters >>> (quorum=2), or 5 masters (quorum=3). >>> >>> On Wed, Apr 13, 2016 at 8:44 AM, haosdent <[email protected]> wrote: >>> >>>> It sounds like an issue in 0.28. I create a ticket >>>> https://issues.apache.org/jira/browse/MESOS-5207 from this to continue >>>> to >>>> investigate. If @suruchi you could attach logs of mesos masters and >>>> your zookeeper configuration, I think it would more helpful for >>>> investigating. >>>> >>>> On Wed, Apr 13, 2016 at 8:24 PM, Stefano Bianchi <[email protected]> >>>> wrote: >>>> >>>>> Thanks for your reply @haosdent. >>>>> I destroyed my VM and re build mesos 0.28 with just one master, and >>>>> now is working. >>>>> i will try to add another master but for the moment, since on >>>>> openstack i don't have much resources i need to use that VM as a slave. >>>>> However in the previous configuration the switch between two masters >>>>> was ok, just when the master was leading after, more or less 30 seconds, >>>>> there was that Failed to connect message. >>>>> >>>>> 2016-04-13 13:08 GMT+02:00 haosdent <[email protected]>: >>>>> >>>>>> Hi, @Stefano Could you show conf/zoo.cfg? And how many zookeper nodes >>>>>> you haved? And "but after a while again Failed to connec", how long >>>>>> the interval here? Is it always "few seconds"? >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Best Regards, >>>> Haosdent Huang >>>> >>> >>> >> >

