[kudu-CR] docs: workflow for master migration

2016-09-15 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: docs: workflow for master migration
..


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b9c66505e0efd1f4aef80884346507d4fe08d9c
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] docs: workflow for master migration

2016-09-14 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: docs: workflow for master migration
..


Patch Set 3:

Build Started http://104.196.14.100/job/kudu-gerrit/3426/

-- 
To view, visit http://gerrit.cloudera.org:8080/4300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b9c66505e0efd1f4aef80884346507d4fe08d9c
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] docs: workflow for master migration

2016-09-14 Thread Adar Dembo (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4300

to look at the new patch set (#3).

Change subject: docs: workflow for master migration
..

docs: workflow for master migration

Change-Id: I9b9c66505e0efd1f4aef80884346507d4fe08d9c
---
M docs/administration.adoc
1 file changed, 160 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/00/4300/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9b9c66505e0efd1f4aef80884346507d4fe08d9c
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] docs: workflow for master migration

2016-09-14 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: docs: workflow for master migration
..


Patch Set 1:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/4300/1//COMMIT_MSG
Commit Message:

Line 7: docs: workflow for master migration
> A question: will this (or something like this) work to migrate, say, from 3
It won't work without more steps for migration from three to five. 
Specifically, once the three masters have started (after their raft configs 
have been rewritten from the command line), you'd need to wait until all three 
have caught up to one another, otherwise copying the tablet to the two new ones 
can incur data loss if one of the original three dies thereafter. 

On top of that, once you have three masters, you probably don't want the outage 
that using this workflow entails. Better to "do it right" with Raft config 
changes once that's implemented.

Anyway, I'll doc that it doesn't work.


http://gerrit.cloudera.org:8080/#/c/4300/1/docs/administration.adoc
File docs/administration.adoc:

Line 236:   recovering from permanent master failures greatly, and is highly 
recommended. The alias should be
> My "how" referred to "How is the user supposed to do this. What is the goal
I don't know, I guess we just disagree on this. In my experience step-by-step 
product documentation is intentionally dry. When reading it, I don't expect to 
learn why something is the way it is; I just expect to solve a problem by 
following instructions.

For this particular step, I think it's important to provide some kind of 
"carrot" to incentivize users to go through DNS changes. Without that, all a 
user knows is that it's optional; they don't know whether it's important or 
not. But at the same time, we don't want to swamp them with technical details. 
I view it as a balancing act that (I agree with you) leaves the more technical 
users in the dark, but focuses the doc for everyone else.

If it helps, the "recover from permanent master failure" doc (still in 
progress) will talk about this in a little more detail.


Line 241:   colocated with other services, though not with another master from 
the same configuration.
> what other services? Are we advising that people co-locate the master with 
This my CM experience talking; "other services" refers to any other data system 
or load-intensive process that may be deployed in the cluster. I'll clarify a 
bit.


Line 244: * Identify and record the directory where the master's data will live.
> IMO identify leans more towards "finding the identity" of something vs "cho
Alright, I'll change it.


Line 246: * Optional: configure a DNS cname or /etc/hsots alias to the master's 
hostname (e.g. `master-2`,
> same as above
See above.


Line 251: . Shut down the entire cluster.
> Does it mean shutting down the machines or just Kudu processes?  If the lat
Yeah, I'll clarify that we're talking about the processes here, not the 
machines. There's no actual "graceful" shutdown for Kudu though, so I'll elide 
that word to avoid confusion.

I've omitted the part about disabling Kudu services. I think it's implied in 
"maintenance window", plus the "undisabling" sentence proved to be 
unnecessarily verbose and confusing.


PS1, Line 264:  
> Nit: an extra space.
Done


PS1, Line 264: DNS cnames
> Nit: DNS names.  Those could be A records, right?
Hmm, OK. I guess it could be either cnames or A records. I'll clarify.


Line 284: . Start the existing master.
> If recommending disabling Kudu services, then 'Enable and start ...'
Yeah, this is the part that I think gets too verbose, hence why I omitted the 
"disable" from earlier.


Line 314: are working properly, consider performing the following sanity checks:
> Yeah, that would me my suggestion too. First the user should make sure that
OK, I'll checking that the /masters page on each web UI looks the same (and 
that one master was elected leader), and use ksck.


-- 
To view, visit http://gerrit.cloudera.org:8080/4300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b9c66505e0efd1f4aef80884346507d4fe08d9c
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] docs: workflow for master migration

2016-09-06 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: docs: workflow for master migration
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4300/1/docs/administration.adoc
File docs/administration.adoc:

Line 236:   recovering from permanent master failures greatly, and is highly 
recommended. The alias should be
> To clarify, is your question "how does it simplify recovering from permanen
My "how" referred to "How is the user supposed to do this. What is the goal and 
steps". Someone with no context won't know how or why this "simplifies 
recovering from permanent master failures greatly". It just seems like, with 
the removal of the "complex and distracting" explanation you settled on 
something that is worse than not having anything at all.
I don't feel strongly about a particular route (between: full explanation, 
pointing to the design doc or removing it altogether) just find that leaving 
just this is confusing.


Line 241:   colocated with other services, though not with another master from 
the same configuration.
what other services? Are we advising that people co-locate the master with a 
tablet server? make that clear.


Line 244: * Identify and record the directory where the master's data will live.
> I wanted "record" in there to make it clear that the choice being made need
IMO identify leans more towards "finding the identity" of something vs 
"choosing the identity" of something. Breaking that symmetry is exactly my 
point since above it's the former case and here it's the latter


Line 314: are working properly, consider performing the following sanity checks:
> Is there a way to list existing masters in the system and  status of each? 
Yeah, that would me my suggestion too. First the user should make sure that the 
system state is the expected one and then yes, try it. Maybe point to a tool 
(like ksck) that also does scans as scanning usually requires writing code and 
we wouldn't want an admin to have to write custom code to make sure the 
migration worked.


-- 
To view, visit http://gerrit.cloudera.org:8080/4300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b9c66505e0efd1f4aef80884346507d4fe08d9c
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward #149
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] docs: workflow for master migration

2016-09-02 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change.

Change subject: docs: workflow for master migration
..


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/4300/1//COMMIT_MSG
Commit Message:

Line 7: docs: workflow for master migration
A question: will this (or something like this) work to migrate, say, from 3 
master configuration to 5 master configuration?  If yes and it's verified that 
it works, consider mentioning this in the document.  Or may be it should be a 
separate document?


http://gerrit.cloudera.org:8080/#/c/4300/1/docs/administration.adoc
File docs/administration.adoc:

Line 251: . Shut down the entire cluster.
Does it mean shutting down the machines or just Kudu processes?  If the latter, 
consider changing to 'Gracefully shut down Kudu processes on the entire 
cluster.'

Besides, I would consider disabling Kudu services (kudu-master and 
kudu-tserver) for the span of the migration procedure.  The reason is to make 
sure the processes are not to start automatically in case of spurious rebooting 
of machines before the procedure is completed.


PS1, Line 264:  
Nit: an extra space.


PS1, Line 264: DNS cnames
Nit: DNS names.  Those could be A records, right?


Line 284: . Start the existing master.
If recommending disabling Kudu services, then 'Enable and start ...'


Line 301: . Start all of the new masters.
If recommending disabling Kudu services, then 'Enable and start ...'


Line 311: . Start all of the tablet servers.
If recommending disabling Kudu services, then 'Enable and start ...'


Line 314: are working properly, consider performing the following sanity checks:
> Obviously there are a lot of ways that this workflow could be botched. But,
Is there a way to list existing masters in the system and  status of each?  If 
yes, it might make sense to check that the desired and the actual lists match.


-- 
To view, visit http://gerrit.cloudera.org:8080/4300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b9c66505e0efd1f4aef80884346507d4fe08d9c
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward #149
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] docs: workflow for master migration

2016-09-02 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: docs: workflow for master migration
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4300/1/docs/administration.adoc
File docs/administration.adoc:

Line 202: For true high availability and to avoid a single point of failure, 
Kudu clusters should be created
> remove "true"
Done


Line 236:   recovering from permanent master failures greatly, and is highly 
recommended. The alias should be
> how? are you linking this to somewhere else, or are you adding this info on
To clarify, is your question "how does it simplify recovering from permanent 
master failures"? Or "how do I configure a DNS cname or /etc/hosts alias"?

To the first question: originally I had an explanation embedded in this 
section, but I found it to be too complex and distracting from the rest of the 
workflow, so I removed it. The actual explanation can be found in the "handling 
permanent failure in masters" design doc, but I don't think it makes sense to 
link to a design doc from product documentation.

To the second question: configuring DNS is out of scope of this document 
because we expect administrators to know how to do that already, or to find 
someone in their organization who does.


Line 244: * Identify and record the directory where the master's data will live.
> this is confusing, the user can "choose" these now since it's running new d
I wanted "record" in there to make it clear that the choice being made needs to 
be remembered for later on.

After that, "choose" and "identify" seem synonymous enough to me, and I think 
it's good to evoke symmetry with respect to the previous step (where we 
identified and recorded the data directory of the existing master).


Line 282: ** `port` is the master's previously recorded RPC port number
> it would be good to have examples for an actual correct command, maybe some
I've added an example for each command.


Line 314: are working properly, consider performing the following sanity checks:
> how does this validate that the new masters are working properly? if the us
Obviously there are a lot of ways that this workflow could be botched. But, if 
the user at least rewrote the Raft configuration on the existing master such 
that it includes multiple entries, a scan will fail if there aren't healthy 
masters behind those entries (because the client will look for a leader master 
and won't find one, since the one working master can't get a majority of votes).

What would you propose as a better means of post-migration validation?


-- 
To view, visit http://gerrit.cloudera.org:8080/4300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b9c66505e0efd1f4aef80884346507d4fe08d9c
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Anonymous Coward #149
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] docs: workflow for master migration

2016-09-02 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: docs: workflow for master migration
..


Patch Set 2:

Build Started http://104.196.14.100/job/kudu-gerrit/3221/

-- 
To view, visit http://gerrit.cloudera.org:8080/4300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b9c66505e0efd1f4aef80884346507d4fe08d9c
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Anonymous Coward #149
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] docs: workflow for master migration

2016-09-02 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: docs: workflow for master migration
..


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4300/1/docs/administration.adoc
File docs/administration.adoc:

Line 202: For true high availability and to avoid a single point of failure, 
Kudu clusters should be created
remove "true"


Line 236:   recovering from permanent master failures greatly, and is highly 
recommended. The alias should be
how? are you linking this to somewhere else, or are you adding this info on a 
later patch. If its the latter I would recommend adding this info there.


Line 244: * Identify and record the directory where the master's data will live.
this is confusing, the user can "choose" these now since it's running new 
daemons, what should be indentified and recorded?


Line 246: * Optional: configure a DNS cname or /etc/hsots alias to the master's 
hostname (e.g. `master-2`,
same as above


Line 282: ** `port` is the master's previously recorded RPC port number
it would be good to have examples for an actual correct command, maybe 
somewhere else too


Line 314: are working properly, consider performing the following sanity checks:
how does this validate that the new masters are working properly? if the user 
did stuff wrong nothing changed, wouldnt this work anyway?


-- 
To view, visit http://gerrit.cloudera.org:8080/4300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b9c66505e0efd1f4aef80884346507d4fe08d9c
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Anonymous Coward #149
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] docs: workflow for master migration

2016-09-01 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: docs: workflow for master migration
..


Patch Set 1:

I tested this manually on a 4 node cluster, using CM and /etc/hosts aliases. 
The cluster is alive and (seemingly) well, but I'll do more testing on it 
tomorrow to make sure.

-- 
To view, visit http://gerrit.cloudera.org:8080/4300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b9c66505e0efd1f4aef80884346507d4fe08d9c
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] docs: workflow for master migration

2016-09-01 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: docs: workflow for master migration
..


Patch Set 1:

Build Started http://104.196.14.100/job/kudu-gerrit/3205/

-- 
To view, visit http://gerrit.cloudera.org:8080/4300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9b9c66505e0efd1f4aef80884346507d4fe08d9c
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] docs: workflow for master migration

2016-09-01 Thread Adar Dembo (Code Review)
Hello Todd Lipcon,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/4300

to review the following change.

Change subject: docs: workflow for master migration
..

docs: workflow for master migration

Change-Id: I9b9c66505e0efd1f4aef80884346507d4fe08d9c
---
M docs/administration.adoc
1 file changed, 122 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/00/4300/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I9b9c66505e0efd1f4aef80884346507d4fe08d9c
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Todd Lipcon