Adar Dembo has posted comments on this change.
Change subject: docs: design for handling permanent master failures
Patch Set 3:
Line 52: 3. Copy the master's entire data/WAL directory from **X** to **Y**.
> hrm, this is odd -- I thought in step 2, X died. how are we going to copy i
To be honest I didn't delve into the various ways in which this condition (that
X is "dead" but the data is salvageable) could be satisfied. Here are some
1. X is super old and we'd like to decommission it. It'll be considered "dead"
after the copy.
2. X has a bad DIMM that causes faulst rarely. Maybe we'll rip out the bad
DIMM, boot, do the copy, then decommission it.
3. Some other piece of X's hardware is gone, in which case yes, we may move the
Do you think these are too contrived? Should I just rewrite this to dispel any
notion that today's Kudu can recover from some kinds of permanent failure?
Line 114: 2. Find new master machines, creating DNS cnames for all of them.
Create a DNS
> how will this work in the context of a management tool like CM? wouldn't th
I haven't given much thought to CM since it's out of scope for the Kudu
_project_, but yeah, we may need that. Is there a similar concept in HDFS?
Line 136: 2. Implement new command line tool to rewrite cmeta files.
> can we combine these two? something that leads you through the process?
I'd rather have both: a command line tool that can perform each (specific) task
on its own, and a script that ties them together.
Now that I've implemented this, though, it's proving difficult to combine since
different pieces of work happen on different machines:
1. On each new master, run new "format" command to create FS.
2. On each new master, run kudu-fs_dump "list_uuid" to get the FS's UUID.
3. On the old master, run new "cmeta rewrite" command with the new UUIDs,
hostports, and existing UUID/hostport.
4. On each new master, run new "tablet copy" command to fetch the master tablet.
I guess it can be done with a shell script that uses ssh to get to each
machine. That won't work in every environment, though.
To view, visit http://gerrit.cloudera.org:8080/3393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Owner: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>