[kudu-CR] docs: informal design for handling permanent master failures
Adar Dembo has posted comments on this change. Change subject: docs: informal design for handling permanent master failures .. Patch Set 1: (15 comments) http://gerrit.cloudera.org:8080/#/c/3393/1/docs/design-docs/master-perm-failure-1.0.md File docs/design-docs/master-perm-failure-1.0.md: Line 19: As part of Kudu's upcoming 1.0 release, we've been working towards improving > Too wordy. How about replacing this paragraph with: I think you're approaching this document in a different way than what I intended. My original intent was for the doc to be feature-driven. "Surviving transient failures" is a feature. "Surviving permanent failures with an intact disk" is a feature. "Surviving all permanent failures, period" is a feature. I'm trying to draw this distinction to help us decide what kind of permanent failure handling (if any) we should implement in time for Kudu 1.0, given the feature set we'd derive and the time it'd take to implement. Does that make sense? It's why I'm not drawing attention to adding/removing masters until later on, and why I led with this paragraph to frame the discussion. I'll reword it to be more terse, but I would like to keep it largely intact for this reason. Line 26: Let's assume we have a healthy Raft configuration consisting of three > I think we should remove this paragraph entirely. It has nothing to do with I'd like to keep it so I added a section header. Line 38: The most important question is: should we build permanent failure handling into > Maybe preface this with a section header "Manual repair when data from a do Done Line 64: ## Proposal > How about "Design proposal for dynamically adding / removing masters" I've moved the above paragraph into this section, so I'd like this section's label to convey "a proposal for handling any kind of permanent failure" (the feature) rather than "a proposal for adding/removing masters" (the design and implementation). I've reworded it, but due to the above rationale, not exactly in the way that you suggested. Line 65: > Before we get to the algorithm, this needs an intro to summarize the propos Good idea. I've incorporated your text nearly verbatim. Line 66: Here is the sequence of events: > I think "algorithm" would be a better name for this Done Line 95: While we believe this approach is correct, we would like more eyes on it to > Since this is now basically a formal design doc, we can assume more eyes on Done PS1, Line 119: mitigate the above confusion > how about "To avoid the above inconsistency in the semantics of the --maste Given the proximity to the previous paragraph, I think it's implicit that we're talking about the --master_addresses gflag. But I'll reword a bit to link back to "inconsistent semantics" Line 121: disk. Then, we can remove **--master_addresses** and use a new kudu-admin > "remove --master_addresses gflag from the kudu-master processes" ... maybe I'll clarify, though I'm trying to be careful and not confuse people into thinking --master_addresses is a tserver gflag too. I also added a blurb to the first reference to --master_addresses and --tserver_master_addrs to explain what each is. Line 126: started in normal mode, it creates a Raft config of one (just itself) and > This mode stuff is quite confusing. I'm leaning toward implementing a total I specified it this way so that just running the kudu-master binary with minimal gflag configuration from the command line yields an operational single-node master. Requiring a "format" for that case (which is exercised often by manual and automated tests alike) would be a frustrating tax. That said, perhaps the "format" approach would be ideal for the multi-master case. For a single node, just run kudu-master as-is. For multi-node, do a "format" first, then run kudu-master (no special gflags). The master should have enough information at that point to come up in "listening" mode and await remote bootstrap, right? Effectively this would mean inserting the format command just before step 5 in the algorithm. What do you think? Line 133: 1. There’s a certain desire to completely remove **last_known_addr** from > s/There's a certain desire/It's useful on tablet servers/ Done Line 134:**RaftPeerPB** (as only UUIDs should be necessary for identifying members of > only UUIDs should be necessary, and this makes it easier for tablet servers Done Line 153: the semantics of **--tserver_master_addrs** become confusing just like > s/become/becomes/ Nope, the noun here is semantics, which is plural. "The semantics become confusing", not "the semantics becomes confusing". Line 156: removed. To alleviate the confusion we could do the same thing we did for > I think this process is overly onerous. It also makes it possible to "orpha I understand that the union of --tserver_master_addrs and a dynamic discovery mechanism is more flexible,
[kudu-CR] docs: informal design for handling permanent master failures
Kudu Jenkins has posted comments on this change. Change subject: docs: informal design for handling permanent master failures .. Patch Set 1: Build Started http://104.196.14.100/job/kudu-gerrit/1852/ -- To view, visit http://gerrit.cloudera.org:8080/3393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f05c319c89cf37e2d71fdc4b7ec951b2932a2b2 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] docs: informal design for handling permanent master failures
Mike Percy has posted comments on this change. Change subject: docs: informal design for handling permanent master failures .. Patch Set 1: (15 comments) This is sounding more formal, let's just call it a real design doc :) I added some suggestions to make it easier for others to review and read, IMHO. I also added some comments about the design. http://gerrit.cloudera.org:8080/#/c/3393/1/docs/design-docs/master-perm-failure-1.0.md File docs/design-docs/master-perm-failure-1.0.md: Line 19: As part of Kudu's upcoming 1.0 release, we've been working towards improving Too wordy. How about replacing this paragraph with: This document discusses one of the remaining gaps in Kudu's support for multi-master operation: adding and removing masters. Regarding the transient vs permanent thing, adding / removing masters has nothing to do with transient failures. Maybe I'm assuming too much but when I read this doc I would assume that is obvious. If it's not, then maybe it's worth a single sentence: "Transient failures are handled by the normal Raft leader election mechanisms." Line 26: Let's assume we have a healthy Raft configuration consisting of three I think we should remove this paragraph entirely. It has nothing to do with the purpose of this design doc. If you want to keep it, how about add a section header along the lines of "Handling transient failures" or something, so people can just skip it. Line 38: The most important question is: should we build permanent failure handling into Maybe preface this with a section header "Manual repair when data from a downed node is still available". We still haven't gotten to the actual design part yet. Line 64: ## Proposal How about "Design proposal for dynamically adding / removing masters" Line 65: Before we get to the algorithm, this needs an intro to summarize the proposal in English. along the lines of: We will reuse most of the configuration change and remote bootstrap design and implementation from the tablet servers. The differences are: 1) Adding and removing masters is not performed automatically. An administrator must trigger those actions with some tool. 2) Administrators must also modify command line flags on tablet servers when a master is replaced, so that the tablet servers can find the new master. 3) Masters will decide whether to self-initialize a new configuration or join an existing one based on their command-line flags and their consensus metadata files (discussed below in Master directory management, phase 2) Line 66: Here is the sequence of events: I think "algorithm" would be a better name for this Line 95: While we believe this approach is correct, we would like more eyes on it to Since this is now basically a formal design doc, we can assume more eyes on it is a given. Let's replace this whole paragraph with the following: In order to implement the above design, we'll need to make the following changes: PS1, Line 119: mitigate the above confusion how about "To avoid the above inconsistency in the semantics of the --master_addresses gflag," Line 121: disk. Then, we can remove **--master_addresses** and use a new kudu-admin "remove --master_addresses gflag from the kudu-master processes" ... maybe some wording like this will clarify that we're not talking about tservers here, for anybody who's lost track of that Line 126: started in normal mode, it creates a Raft config of one (just itself) and This mode stuff is quite confusing. I'm leaning toward implementing a totally manual process, including a "format master" command to "spawn" our first master server(s). So basically if a master starts up, and it has an existing master tablet (with assumedly a consensus metadata file) then it starts up in normal mode. If it doesn't, it starts up in listening mode. Line 133: 1. There’s a certain desire to completely remove **last_known_addr** from s/There's a certain desire/It's useful on tablet servers/ Line 134:**RaftPeerPB** (as only UUIDs should be necessary for identifying members of only UUIDs should be necessary, and this makes it easier for tablet servers to change IPs and ports transparently Line 153: the semantics of **--tserver_master_addrs** become confusing just like s/become/becomes/ Line 156: removed. To alleviate the confusion we could do the same thing we did for I think this process is overly onerous. It also makes it possible to "orphan" a tserver with no way back that is covered by this spec. How about we just say that --tserver_master_addrs is a pool of potential masters, tservers don't store anything about the masters on disk, and it can be modified in memory by the above mechanisms (i.e. ListMasters()) but if you want additional potential masters to be known at boot time then they have to be in the gflags. Well, maybe the tserver should store one thing on disk: the sequence or OpId of
[kudu-CR] docs: informal design for handling permanent master failures
Adar Dembo has posted comments on this change. Change subject: docs: informal design for handling permanent master failures .. Patch Set 1: Rendered content available here: https://github.com/adembo/kudu/blob/config_change/docs/design-docs/master-perm-failure-1.0.md. -- To view, visit http://gerrit.cloudera.org:8080/3393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f05c319c89cf37e2d71fdc4b7ec951b2932a2b2 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] docs: informal design for handling permanent master failures
Hello David Ribeiro Alves, Mike Percy, Todd Lipcon, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/3393 to review the following change. Change subject: docs: informal design for handling permanent master failures .. docs: informal design for handling permanent master failures Here's an informal design doc that describes how we might address permanent master failures. It presents a hacky way to handle some permanent failures without implementing full master config change, then describes how config change might be implemented. The most important question I'd like to discuss first is: should master config change support be implemented? As the doc describes (though not in great detail), config change would be expensive to implement, and we're running out of time for 1.0. We might decide that permanent failure is out of scope for 1.0 (or at least permanent failures that also take out the disk), and push config change out to a different release. If we do that, we could handle migration of one node to three nodes with a script, or not at all (i.e. this is beta software). Change-Id: I2f05c319c89cf37e2d71fdc4b7ec951b2932a2b2 --- M docs/design-docs/README.md A docs/design-docs/master-perm-failure-1.0.md 2 files changed, 170 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/3393/1 -- To view, visit http://gerrit.cloudera.org:8080/3393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I2f05c319c89cf37e2d71fdc4b7ec951b2932a2b2 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon