Adar Dembo has posted comments on this change.
Change subject: docs: informal design for handling permanent master failures
Patch Set 1:
Line 19: As part of Kudu's upcoming 1.0 release, we've been working towards
> Too wordy. How about replacing this paragraph with:
I think you're approaching this document in a different way than what I
My original intent was for the doc to be feature-driven. "Surviving transient
failures" is a feature. "Surviving permanent failures with an intact disk" is a
feature. "Surviving all permanent failures, period" is a feature. I'm trying to
draw this distinction to help us decide what kind of permanent failure handling
(if any) we should implement in time for Kudu 1.0, given the feature set we'd
derive and the time it'd take to implement.
Does that make sense? It's why I'm not drawing attention to adding/removing
masters until later on, and why I led with this paragraph to frame the
discussion. I'll reword it to be more terse, but I would like to keep it
largely intact for this reason.
Line 26: Let's assume we have a healthy Raft configuration consisting of three
> I think we should remove this paragraph entirely. It has nothing to do with
I'd like to keep it so I added a section header.
Line 38: The most important question is: should we build permanent failure
> Maybe preface this with a section header "Manual repair when data from a do
Line 64: ## Proposal
> How about "Design proposal for dynamically adding / removing masters"
I've moved the above paragraph into this section, so I'd like this section's
label to convey "a proposal for handling any kind of permanent failure" (the
feature) rather than "a proposal for adding/removing masters" (the design and
I've reworded it, but due to the above rationale, not exactly in the way that
> Before we get to the algorithm, this needs an intro to summarize the propos
Good idea. I've incorporated your text nearly verbatim.
Line 66: Here is the sequence of events:
> I think "algorithm" would be a better name for this
Line 95: While we believe this approach is correct, we would like more eyes on
> Since this is now basically a formal design doc, we can assume more eyes on
PS1, Line 119: mitigate the above confusion
> how about "To avoid the above inconsistency in the semantics of the --maste
Given the proximity to the previous paragraph, I think it's implicit that we're
talking about the --master_addresses gflag. But I'll reword a bit to link back
to "inconsistent semantics"
Line 121: disk. Then, we can remove **--master_addresses** and use a new
> "remove --master_addresses gflag from the kudu-master processes" ... maybe
I'll clarify, though I'm trying to be careful and not confuse people into
thinking --master_addresses is a tserver gflag too.
I also added a blurb to the first reference to --master_addresses and
--tserver_master_addrs to explain what each is.
Line 126: started in normal mode, it creates a Raft config of one (just itself)
> This mode stuff is quite confusing. I'm leaning toward implementing a total
I specified it this way so that just running the kudu-master binary with
minimal gflag configuration from the command line yields an operational
single-node master. Requiring a "format" for that case (which is exercised
often by manual and automated tests alike) would be a frustrating tax.
That said, perhaps the "format" approach would be ideal for the multi-master
case. For a single node, just run kudu-master as-is. For multi-node, do a
"format" first, then run kudu-master (no special gflags). The master should
have enough information at that point to come up in "listening" mode and await
remote bootstrap, right? Effectively this would mean inserting the format
command just before step 5 in the algorithm.
What do you think?
Line 133: 1. There’s a certain desire to completely remove **last_known_addr**
> s/There's a certain desire/It's useful on tablet servers/
Line 134: **RaftPeerPB** (as only UUIDs should be necessary for identifying
> only UUIDs should be necessary, and this makes it easier for tablet servers
Line 153: the semantics of **--tserver_master_addrs** become confusing just like
Nope, the noun here is semantics, which is plural. "The semantics become
confusing", not "the semantics becomes confusing".
Line 156: removed. To alleviate the confusion we could do the same thing we did
> I think this process is overly onerous. It also makes it possible to "orpha
I understand that the union of --tserver_master_addrs and a dynamic discovery
mechanism is more flexible, but I think it's also tougher to reason about ("why
is this tserver heartbeating to that master? Is it because of the gflag value?
Is it because of what the tserver found at runtime?"). I generally prefer
solutions that are more predictable in nature, since that means an easier time
troubleshooting (for us) and operating (for administrators).
Onerousness aside, how is orphaning possible? I assumed that, just as with the
master, the on-disk RaftConfigPB can be modified by a command line tool
(probably the same tool as is used by the master). That way, if an old tserver
starts and can't find any of it's (now long gone) masters, the operator can use
the tool to rewrite the master configuration.
I don't think rogue masters (in the "tserver is talking to a stale master that
has been partitioned from its Raft config" sense of the word) are an issue, for
the same reason that they aren't an issue in the rest of the multi-master
world: any destructive operation that the rogue master could carry out on the
tserver must be replicated first, at which point the rogue master would notice
that it's no longer the leader.
But maybe you meant "rogue master" as in "wrong master"? Like, a tserver that
was configured to heartbeat to the wrong Kudu deployment? In that case, the
tserver could check the UUIDs of incoming requests and heartbeat responses
against the UUIDs of the on-disk RaftConfigPB, and ignore requests coming from
Line 169: the configuration recover from this incomplete state?
> Yeah, we may need to look at implementing KUDU-1194: consensus: Allow abort
I'll add references to the bugs you mentioned, but I'm going to leave this
section largely unspecified for now, since I'd rather not sink a ton of time
into a design if we're feeling we may not implement it yet.
To view, visit http://gerrit.cloudera.org:8080/3393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Owner: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>