Mike Percy has posted comments on this change.

Change subject: docs: informal design for handling permanent master failures

Patch Set 1:


This is sounding more formal, let's just call it a real design doc :)

I added some suggestions to make it easier for others to review and read, IMHO.

I also added some comments about the design.

File docs/design-docs/master-perm-failure-1.0.md:

Line 19: As part of Kudu's upcoming 1.0 release, we've been working towards 
Too wordy. How about replacing this paragraph with:

This document discusses one of the remaining gaps in Kudu's support for 
multi-master operation: adding and removing masters.

Regarding the transient vs permanent thing, adding / removing masters has 
nothing to do with transient failures. Maybe I'm assuming too much but when I 
read this doc I would assume that is obvious. If it's not, then maybe it's 
worth a single sentence: "Transient failures are handled by the normal Raft 
leader election mechanisms."

Line 26: Let's assume we have a healthy Raft configuration consisting of three
I think we should remove this paragraph entirely. It has nothing to do with the 
purpose of this design doc.

If you want to keep it, how about add a section header along the lines of 
"Handling transient failures" or something, so people can just skip it.

Line 38: The most important question is: should we build permanent failure 
handling into
Maybe preface this with a section header "Manual repair when data from a downed 
node is still available". We still haven't gotten to the actual design part yet.

Line 64: ## Proposal
How about "Design proposal for dynamically adding / removing masters"

Line 65: 
Before we get to the algorithm, this needs an intro to summarize the proposal 
in English. along the lines of:

We will reuse most of the configuration change and remote bootstrap design and 
implementation from the tablet servers. The differences are:

1) Adding and removing masters is not performed automatically. An administrator 
must trigger those actions with some tool.
2) Administrators must also modify command line flags on tablet servers when a 
master is replaced, so that the tablet servers can find the new master.
3) Masters will decide whether to self-initialize a new configuration or join 
an existing one based on their command-line flags and their consensus metadata 
files (discussed below in Master directory management, phase 2)

Line 66: Here is the sequence of events:
I think "algorithm" would be a better name for this

Line 95: While we believe this approach is correct, we would like more eyes on 
it to
Since this is now basically a formal design doc, we can assume more eyes on it 
is a given. Let's replace this whole paragraph with the following:

In order to implement the above design, we'll need to make the following 

PS1, Line 119: mitigate the above confusion
how about "To avoid the above inconsistency in the semantics of the 
--master_addresses gflag,"

Line 121: disk. Then, we can remove **--master_addresses** and use a new 
"remove --master_addresses gflag from the kudu-master processes" ... maybe some 
wording like this will clarify that we're not talking about tservers here, for 
anybody who's lost track of that

Line 126: started in normal mode, it creates a Raft config of one (just itself) 
This mode stuff is quite confusing. I'm leaning toward implementing a totally 
manual process, including a "format master" command to "spawn" our first master 

So basically if a master starts up, and it has an existing master tablet (with 
assumedly a consensus metadata file) then it starts up in normal mode. If it 
doesn't, it starts up in listening mode.

Line 133: 1. There’s a certain desire to completely remove **last_known_addr** 
s/There's a certain desire/It's useful on tablet servers/

Line 134:    **RaftPeerPB** (as only UUIDs should be necessary for identifying 
members of
only UUIDs should be necessary, and this makes it easier for tablet servers to 
change IPs and ports transparently

Line 153: the semantics of **--tserver_master_addrs** become confusing just like

Line 156: removed. To alleviate the confusion we could do the same thing we did 
I think this process is overly onerous. It also makes it possible to "orphan" a 
tserver with no way back that is covered by this spec.

How about we just say that --tserver_master_addrs is a pool of potential 
masters, tservers don't store anything about the masters on disk, and it can be 
modified in memory by the above mechanisms (i.e. ListMasters()) but if you want 
additional potential masters to be known at boot time then they have to be in 
the gflags.

Well, maybe the tserver should store one thing on disk: the sequence or OpId of 
the last committed master config. This would help prevent against action by 
rogue masters. Although that is not really relevant to this doc, it's more 
relevant to multi-master operation overall. We may have already discussed and 
resolved this issue in that context, if so then we can punt on this

Line 169: the configuration recover from this incomplete state?
Yeah, we may need to look at implementing KUDU-1194: consensus: Allow abort of 
uncommittable config change ops, or potentially something where we allow 
tombstoned tablets or non-members of the config to vote (KUDU-871: Allow 
tombstoned tablets to vote)

To view, visit http://gerrit.cloudera.org:8080/3393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f05c319c89cf37e2d71fdc4b7ec951b2932a2b2
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to