Ok, thanks Ben! In would be nice to update documentation accordingly.

So, in 0.20 there might be a flag specifying total number of masters?


On 23 July 2014 00:13, Benjamin Mahler <[email protected]> wrote:

> At the current time, you need an odd number of masters as there is an
> assumption built into the replicated that the number of masters = 2*quorum
> - 1. This assumption is present when bootstrapping the log from no data.
>
> To recover from this, you need to run an odd number of masters, and set
> your quorum correctly. For example, 3 masters with quorum 2, or 5 masters
> with quorum 3. It is safe to wipe the replica logs before doing this.
>
> There are some outstanding tickets to clean this up:
> https://issues.apache.org/jira/browse/MESOS-1465
> https://issues.apache.org/jira/browse/MESOS-1546
>
> We'd like to have the configuration be explicit about the total number of
> masters, so that the assumption need not be made.
>
>
> On Tue, Jul 22, 2014 at 2:40 AM, Tomas Barton <[email protected]>
> wrote:
>
>> Hi,
>>
>> what is the best way to upgrade Mesos cluster from 0.18 to 0.19? I've
>> tried to read all documentation before doing actual upgrade, but I still
>> don't understand a few things.
>>
>> What should be the quorum size?
>>
>> The --help says that "It is imperative to set this value to be a majority
>> of masters i.e., quorum > (number of masters)/2"
>>
>> I have 4 Mesos masters, which would mean that quorum > 2 -> quorum=3,
>> right?
>>
>> The recover.cpp says that: "we allow a replica in EMPTY status to become
>> VOTING immediately if it finds ALL (i.e., 2 * quorum - 1) replicas are in
>> EMPTY status"
>> So, with quorum = 3 I would need 5 Mesos masters (that's just not clear
>> from the mesos-master --help).
>>
>> quorum=1, mesos-masters=1
>> quorum=2, mesos-masters=3
>> quorum=3, mesos-masters=5
>> quorum=4, mesos-masters=7
>>
>> Is is possible to have non-even number of Mesos masters? or is it just a
>> bad idea?
>>
>> With 4 masters I got into a situation when:
>>
>> master 1:
>> I0722 11:35:40.708562 12689 replica.cpp:638] Replica in VOTING status
>> received a broadcasted recover request
>>
>> master 2:
>> I0722 11:36:37.593647  7754 replica.cpp:638] Replica in EMPTY status
>> received a broadcasted recover request
>>
>> master 3:
>> I0722 11:35:14.102762 26701 recover.cpp:188] Received a recover response
>> from a replica in STARTING status
>>
>> master 4:
>> I0722 11:35:54.284169 32056 replica.cpp:638] Replica in STARTING status
>> received a broadcasted recover request
>> I0722 11:35:54.284425 32050 recover.cpp:188] Received a recover response
>> from a replica in STARTING status
>> I0722 11:35:54.284788 32057 recover.cpp:188] Received a recover response
>> from a replica in VOTING status
>> I0722 11:35:54.285127 32050 recover.cpp:188] Received a recover response
>> from a replica in EMPTY status
>>
>> And the election algorithm ends up in an endless loop. How can I recover
>> from this? Delete all replica logs from master disk? Start with quorum=1
>> and increment number of masters?
>>
>> Thanks,
>> Tomas
>>
>
>

Reply via email to