Re: Mesos Masters Leader Keeps Fluctuating

Adam Bordelon Thu, 14 Apr 2016 02:05:11 -0700

You could try attaching them to this thread, but ideally we'd want them
attached to the JIRA that haosdent created:
https://issues.apache.org/jira/browse/MESOS-5207
You may need to create a JIRA account to upload a file. If it still won't
let you, ask dev@ to make you a JIRA contributor.


On Thu, Apr 14, 2016 at 1:50 AM, Stefano Bianchi <[email protected]>
wrote:

> Ok please follow me in this strange story.
> At the beginning i have set 3 mesos master on the same cluster using mesos
> 0.27
> now i deleted one of these 3 mesos 0.27 masters and build a mesos 0.28 to
> joint to the other 2.
> I get the problem i described after 10 seconds i get failed to connect,
> but the election of new one works fine because the new leader is 0.27 which
> is stable.
> How to send to you logs?
>
> 2016-04-13 23:53 GMT+02:00 Adam Bordelon <[email protected]>:
>
>> See also http://mesos.apache.org/documentation/latest/operational-guide/
>>
>> On Wed, Apr 13, 2016 at 2:44 PM, Adam Bordelon <[email protected]>
>> wrote:
>>
>>> Having 2 masters (even with a quorum of 2) is no more useful than having
>>> a single master, since if one of your 2 masters goes down you lose quorum,
>>> and your cluster will fail to recover, since it cannot write state changes
>>> to both masters.
>>>
>>> Setting the quorum to 1 for a cluster with 2 masters would expose you to
>>> potential split-brain problems, in case of a network partition. You would
>>> then have 2 masters that each think they are the leader, and they would be
>>> unable to reconcile their differences if the partition ends and they
>>> reconnect to each other.
>>>
>>> It is intended that you will have an odd number of masters (similar to
>>> the requirement to have an odd number of ZKs), and quorum =
>>> ceiling(numMasters / 2); so you would have 1 master (quorum=1), 3 masters
>>> (quorum=2), or 5 masters (quorum=3).
>>>
>>> On Wed, Apr 13, 2016 at 8:44 AM, haosdent <[email protected]> wrote:
>>>
>>>> It sounds like an issue in 0.28. I create a ticket
>>>> https://issues.apache.org/jira/browse/MESOS-5207 from this to continue
>>>> to
>>>>  investigate. If @suruchi you could attach logs of mesos masters and
>>>> your zookeeper configuration, I think it would more helpful for
>>>> investigating.
>>>>
>>>> On Wed, Apr 13, 2016 at 8:24 PM, Stefano Bianchi <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks for your reply @haosdent.
>>>>> I destroyed my VM and re build mesos 0.28 with just one master, and
>>>>> now is working.
>>>>> i will try to add another master but for the moment, since on
>>>>> openstack i don't have much resources i need to use that VM as a slave.
>>>>> However in the previous configuration the switch between two masters
>>>>> was ok, just when the master was leading after, more or less 30 seconds,
>>>>> there was that Failed to connect message.
>>>>>
>>>>> 2016-04-13 13:08 GMT+02:00 haosdent <[email protected]>:
>>>>>
>>>>>> Hi, @Stefano Could you show conf/zoo.cfg? And how many zookeper nodes
>>>>>> you haved? And "but after a while again Failed to connec", how long
>>>>>> the interval here? Is it always "few seconds"?
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>
>>>
>>
>

Re: Mesos Masters Leader Keeps Fluctuating

Reply via email to