Mike Percy has posted comments on this change.
Change subject: docs: informal design for handling permanent master failures
Patch Set 1:
This is sounding more formal, let's just call it a real design doc :)
I added some suggestions to make it easier for others to review and read, IMHO.
I also added some comments about the design.
Line 19: As part of Kudu's upcoming 1.0 release, we've been working towards
Too wordy. How about replacing this paragraph with:
This document discusses one of the remaining gaps in Kudu's support for
multi-master operation: adding and removing masters.
Regarding the transient vs permanent thing, adding / removing masters has
nothing to do with transient failures. Maybe I'm assuming too much but when I
read this doc I would assume that is obvious. If it's not, then maybe it's
worth a single sentence: "Transient failures are handled by the normal Raft
leader election mechanisms."
Line 26: Let's assume we have a healthy Raft configuration consisting of three
I think we should remove this paragraph entirely. It has nothing to do with the
purpose of this design doc.
If you want to keep it, how about add a section header along the lines of
"Handling transient failures" or something, so people can just skip it.
Line 38: The most important question is: should we build permanent failure
Maybe preface this with a section header "Manual repair when data from a downed
node is still available". We still haven't gotten to the actual design part yet.
Line 64: ## Proposal
How about "Design proposal for dynamically adding / removing masters"
Before we get to the algorithm, this needs an intro to summarize the proposal
in English. along the lines of:
We will reuse most of the configuration change and remote bootstrap design and
implementation from the tablet servers. The differences are:
1) Adding and removing masters is not performed automatically. An administrator
must trigger those actions with some tool.
2) Administrators must also modify command line flags on tablet servers when a
master is replaced, so that the tablet servers can find the new master.
3) Masters will decide whether to self-initialize a new configuration or join
an existing one based on their command-line flags and their consensus metadata
files (discussed below in Master directory management, phase 2)
Line 66: Here is the sequence of events:
I think "algorithm" would be a better name for this
Line 95: While we believe this approach is correct, we would like more eyes on
Since this is now basically a formal design doc, we can assume more eyes on it
is a given. Let's replace this whole paragraph with the following:
In order to implement the above design, we'll need to make the following
PS1, Line 119: mitigate the above confusion
how about "To avoid the above inconsistency in the semantics of the
Line 121: disk. Then, we can remove **--master_addresses** and use a new
"remove --master_addresses gflag from the kudu-master processes" ... maybe some
wording like this will clarify that we're not talking about tservers here, for
anybody who's lost track of that
Line 126: started in normal mode, it creates a Raft config of one (just itself)
This mode stuff is quite confusing. I'm leaning toward implementing a totally
manual process, including a "format master" command to "spawn" our first master
So basically if a master starts up, and it has an existing master tablet (with
assumedly a consensus metadata file) then it starts up in normal mode. If it
doesn't, it starts up in listening mode.
Line 133: 1. There’s a certain desire to completely remove **last_known_addr**
s/There's a certain desire/It's useful on tablet servers/
Line 134: **RaftPeerPB** (as only UUIDs should be necessary for identifying
only UUIDs should be necessary, and this makes it easier for tablet servers to
change IPs and ports transparently
Line 153: the semantics of **--tserver_master_addrs** become confusing just like
Line 156: removed. To alleviate the confusion we could do the same thing we did
I think this process is overly onerous. It also makes it possible to "orphan" a
tserver with no way back that is covered by this spec.
How about we just say that --tserver_master_addrs is a pool of potential
masters, tservers don't store anything about the masters on disk, and it can be
modified in memory by the above mechanisms (i.e. ListMasters()) but if you want
additional potential masters to be known at boot time then they have to be in
Well, maybe the tserver should store one thing on disk: the sequence or OpId of
the last committed master config. This would help prevent against action by
rogue masters. Although that is not really relevant to this doc, it's more
relevant to multi-master operation overall. We may have already discussed and
resolved this issue in that context, if so then we can punt on this
Line 169: the configuration recover from this incomplete state?
Yeah, we may need to look at implementing KUDU-1194: consensus: Allow abort of
uncommittable config change ops, or potentially something where we allow
tombstoned tablets or non-members of the config to vote (KUDU-871: Allow
tombstoned tablets to vote)
To view, visit http://gerrit.cloudera.org:8080/3393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Owner: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>