Ok, thanks Ben! In would be nice to update documentation accordingly. So, in 0.20 there might be a flag specifying total number of masters?
On 23 July 2014 00:13, Benjamin Mahler <[email protected]> wrote: > At the current time, you need an odd number of masters as there is an > assumption built into the replicated that the number of masters = 2*quorum > - 1. This assumption is present when bootstrapping the log from no data. > > To recover from this, you need to run an odd number of masters, and set > your quorum correctly. For example, 3 masters with quorum 2, or 5 masters > with quorum 3. It is safe to wipe the replica logs before doing this. > > There are some outstanding tickets to clean this up: > https://issues.apache.org/jira/browse/MESOS-1465 > https://issues.apache.org/jira/browse/MESOS-1546 > > We'd like to have the configuration be explicit about the total number of > masters, so that the assumption need not be made. > > > On Tue, Jul 22, 2014 at 2:40 AM, Tomas Barton <[email protected]> > wrote: > >> Hi, >> >> what is the best way to upgrade Mesos cluster from 0.18 to 0.19? I've >> tried to read all documentation before doing actual upgrade, but I still >> don't understand a few things. >> >> What should be the quorum size? >> >> The --help says that "It is imperative to set this value to be a majority >> of masters i.e., quorum > (number of masters)/2" >> >> I have 4 Mesos masters, which would mean that quorum > 2 -> quorum=3, >> right? >> >> The recover.cpp says that: "we allow a replica in EMPTY status to become >> VOTING immediately if it finds ALL (i.e., 2 * quorum - 1) replicas are in >> EMPTY status" >> So, with quorum = 3 I would need 5 Mesos masters (that's just not clear >> from the mesos-master --help). >> >> quorum=1, mesos-masters=1 >> quorum=2, mesos-masters=3 >> quorum=3, mesos-masters=5 >> quorum=4, mesos-masters=7 >> >> Is is possible to have non-even number of Mesos masters? or is it just a >> bad idea? >> >> With 4 masters I got into a situation when: >> >> master 1: >> I0722 11:35:40.708562 12689 replica.cpp:638] Replica in VOTING status >> received a broadcasted recover request >> >> master 2: >> I0722 11:36:37.593647 7754 replica.cpp:638] Replica in EMPTY status >> received a broadcasted recover request >> >> master 3: >> I0722 11:35:14.102762 26701 recover.cpp:188] Received a recover response >> from a replica in STARTING status >> >> master 4: >> I0722 11:35:54.284169 32056 replica.cpp:638] Replica in STARTING status >> received a broadcasted recover request >> I0722 11:35:54.284425 32050 recover.cpp:188] Received a recover response >> from a replica in STARTING status >> I0722 11:35:54.284788 32057 recover.cpp:188] Received a recover response >> from a replica in VOTING status >> I0722 11:35:54.285127 32050 recover.cpp:188] Received a recover response >> from a replica in EMPTY status >> >> And the election algorithm ends up in an endless loop. How can I recover >> from this? Delete all replica logs from master disk? Start with quorum=1 >> and increment number of masters? >> >> Thanks, >> Tomas >> > >

