Adar Dembo has posted comments on this change.
Change subject: docs: workflow for master migration
Patch Set 1:
Line 7: docs: workflow for master migration
> A question: will this (or something like this) work to migrate, say, from 3
It won't work without more steps for migration from three to five.
Specifically, once the three masters have started (after their raft configs
have been rewritten from the command line), you'd need to wait until all three
have caught up to one another, otherwise copying the tablet to the two new ones
can incur data loss if one of the original three dies thereafter.
On top of that, once you have three masters, you probably don't want the outage
that using this workflow entails. Better to "do it right" with Raft config
changes once that's implemented.
Anyway, I'll doc that it doesn't work.
Line 236: recovering from permanent master failures greatly, and is highly
recommended. The alias should be
> My "how" referred to "How is the user supposed to do this. What is the goal
I don't know, I guess we just disagree on this. In my experience step-by-step
product documentation is intentionally dry. When reading it, I don't expect to
learn why something is the way it is; I just expect to solve a problem by
For this particular step, I think it's important to provide some kind of
"carrot" to incentivize users to go through DNS changes. Without that, all a
user knows is that it's optional; they don't know whether it's important or
not. But at the same time, we don't want to swamp them with technical details.
I view it as a balancing act that (I agree with you) leaves the more technical
users in the dark, but focuses the doc for everyone else.
If it helps, the "recover from permanent master failure" doc (still in
progress) will talk about this in a little more detail.
Line 241: colocated with other services, though not with another master from
the same configuration.
> what other services? Are we advising that people co-locate the master with
This my CM experience talking; "other services" refers to any other data system
or load-intensive process that may be deployed in the cluster. I'll clarify a
Line 244: * Identify and record the directory where the master's data will live.
> IMO identify leans more towards "finding the identity" of something vs "cho
Alright, I'll change it.
Line 246: * Optional: configure a DNS cname or /etc/hsots alias to the master's
hostname (e.g. `master-2`,
> same as above
Line 251: . Shut down the entire cluster.
> Does it mean shutting down the machines or just Kudu processes? If the lat
Yeah, I'll clarify that we're talking about the processes here, not the
machines. There's no actual "graceful" shutdown for Kudu though, so I'll elide
that word to avoid confusion.
I've omitted the part about disabling Kudu services. I think it's implied in
"maintenance window", plus the "undisabling" sentence proved to be
unnecessarily verbose and confusing.
PS1, Line 264:
> Nit: an extra space.
PS1, Line 264: DNS cnames
> Nit: DNS names. Those could be A records, right?
Hmm, OK. I guess it could be either cnames or A records. I'll clarify.
Line 284: . Start the existing master.
> If recommending disabling Kudu services, then 'Enable and start ...'
Yeah, this is the part that I think gets too verbose, hence why I omitted the
"disable" from earlier.
Line 314: are working properly, consider performing the following sanity checks:
> Yeah, that would me my suggestion too. First the user should make sure that
OK, I'll checking that the /masters page on each web UI looks the same (and
that one master was elected leader), and use ksck.
To view, visit http://gerrit.cloudera.org:8080/4300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Owner: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: John Russell <jruss...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>