Adar Dembo has posted comments on this change.

Change subject: docs: informal design for handling permanent master failures

Patch Set 1:

File docs/design-docs/

Line 19: As part of Kudu's upcoming 1.0 release, we've been working towards 
> Too wordy. How about replacing this paragraph with:
I think you're approaching this document in a different way than what I 

My original intent was for the doc to be feature-driven. "Surviving transient 
failures" is a feature. "Surviving permanent failures with an intact disk" is a 
feature. "Surviving all permanent failures, period" is a feature. I'm trying to 
draw this distinction to help us decide what kind of permanent failure handling 
(if any) we should implement in time for Kudu 1.0, given the feature set we'd 
derive and the time it'd take to implement.

Does that make sense? It's why I'm not drawing attention to adding/removing 
masters until later on, and why I led with this paragraph to frame the 
discussion. I'll reword it to be more terse, but I would like to keep it 
largely intact for this reason.

Line 26: Let's assume we have a healthy Raft configuration consisting of three
> I think we should remove this paragraph entirely. It has nothing to do with
I'd like to keep it so I added a section header.

Line 38: The most important question is: should we build permanent failure 
handling into
> Maybe preface this with a section header "Manual repair when data from a do

Line 64: ## Proposal
> How about "Design proposal for dynamically adding / removing masters"
I've moved the above paragraph into this section, so I'd like this section's 
label to convey "a proposal for handling any kind of permanent failure" (the 
feature) rather than "a proposal for adding/removing masters" (the design and 

I've reworded it, but due to the above rationale, not exactly in the way that 
you suggested.

Line 65: 
> Before we get to the algorithm, this needs an intro to summarize the propos
Good idea. I've incorporated your text nearly verbatim.

Line 66: Here is the sequence of events:
> I think "algorithm" would be a better name for this

Line 95: While we believe this approach is correct, we would like more eyes on 
it to
> Since this is now basically a formal design doc, we can assume more eyes on

PS1, Line 119: mitigate the above confusion
> how about "To avoid the above inconsistency in the semantics of the --maste
Given the proximity to the previous paragraph, I think it's implicit that we're 
talking about the --master_addresses gflag. But I'll reword a bit to link back 
to "inconsistent semantics"

Line 121: disk. Then, we can remove **--master_addresses** and use a new 
> "remove --master_addresses gflag from the kudu-master processes" ... maybe 
I'll clarify, though I'm trying to be careful and not confuse people into 
thinking --master_addresses is a tserver gflag too.

I also added a blurb to the first reference to --master_addresses and 
--tserver_master_addrs to explain what each is.

Line 126: started in normal mode, it creates a Raft config of one (just itself) 
> This mode stuff is quite confusing. I'm leaning toward implementing a total
I specified it this way so that just running the kudu-master binary with 
minimal gflag configuration from the command line yields an operational 
single-node master. Requiring a "format" for that case (which is exercised 
often by manual and automated tests alike) would be a frustrating tax.

That said, perhaps the "format" approach would be ideal for the multi-master 
case. For a single node, just run kudu-master as-is. For multi-node, do a 
"format" first, then run kudu-master (no special gflags). The master should 
have enough information at that point to come up in "listening" mode and await 
remote bootstrap, right? Effectively this would mean inserting the format 
command just before step 5 in the algorithm.

What do you think?

Line 133: 1. There’s a certain desire to completely remove **last_known_addr** 
> s/There's a certain desire/It's useful on tablet servers/

Line 134:    **RaftPeerPB** (as only UUIDs should be necessary for identifying 
members of
> only UUIDs should be necessary, and this makes it easier for tablet servers

Line 153: the semantics of **--tserver_master_addrs** become confusing just like
> s/become/becomes/
Nope, the noun here is semantics, which is plural. "The semantics become 
confusing", not "the semantics becomes confusing".

Line 156: removed. To alleviate the confusion we could do the same thing we did 
> I think this process is overly onerous. It also makes it possible to "orpha
I understand that the union of --tserver_master_addrs and a dynamic discovery 
mechanism is more flexible, but I think it's also tougher to reason about ("why 
is this tserver heartbeating to that master? Is it because of the gflag value? 
Is it because of what the tserver found at runtime?"). I generally prefer 
solutions that are more predictable in nature, since that means an easier time 
troubleshooting (for us) and operating (for administrators).

Onerousness aside, how is orphaning possible? I assumed that, just as with the 
master, the on-disk RaftConfigPB can be modified by a command line tool 
(probably the same tool as is used by the master). That way, if an old tserver 
starts and can't find any of it's (now long gone) masters, the operator can use 
the tool to rewrite the master configuration.

I don't think rogue masters (in the "tserver is talking to a stale master that 
has been partitioned from its Raft config" sense of the word) are an issue, for 
the same reason that they aren't an issue in the rest of the multi-master 
world: any destructive operation that the rogue master could carry out on the 
tserver must be replicated first, at which point the rogue master would notice 
that it's no longer the leader.

But maybe you meant "rogue master" as in "wrong master"? Like, a tserver that 
was configured to heartbeat to the wrong Kudu deployment? In that case, the 
tserver could check the UUIDs of incoming requests and heartbeat responses 
against the UUIDs of the on-disk RaftConfigPB, and ignore requests coming from 
those masters.

Line 169: the configuration recover from this incomplete state?
> Yeah, we may need to look at implementing KUDU-1194: consensus: Allow abort
I'll add references to the bugs you mentioned, but I'm going to leave this 
section largely unspecified for now, since I'd rather not sink a ton of time 
into a design if we're feeling we may not implement it yet.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I2f05c319c89cf37e2d71fdc4b7ec951b2932a2b2
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <>
Gerrit-Reviewer: Adar Dembo <>
Gerrit-Reviewer: David Ribeiro Alves <>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <>
Gerrit-Reviewer: Todd Lipcon <>
Gerrit-HasComments: Yes

Reply via email to