[kudu-CR] [tools] ksck improvements [3/n]: master consensus checks
Will Berkeley has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9883 ) Change subject: [tools] ksck improvements [3/n]: master consensus checks .. [tools] ksck improvements [3/n]: master consensus checks This patch adds consensus consistency checks and a consensus matrix for the master tablet. It's a little trickier than for tablets, because there's no uuid available for an unavailable master. Here's a sample matrix when a master is down: WARNING: masters have consensus conflicts All reported masters are: A = c9afe67944784c70ab13989354c10f9e B = (localhost:7052) C = 488cf2ab551b422aa8c683a054cdcde3 D = 62dc9b532b084a459130f1676d8e1998 Config source |Replicas| Current term | Config index | Committed? ---++--+--+ A | A C* D | 4| -1 | Yes B | [config not available] | | | C | A C* D | 4| -1 | Yes Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b Reviewed-on: http://gerrit.cloudera.org:8080/9883 Reviewed-by: Attila Bukor Tested-by: Will Berkeley Reviewed-by: Andrew Wong --- M src/kudu/integration-tests/cluster_verifier.cc M src/kudu/tools/CMakeLists.txt M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/ksck.h M src/kudu/tools/ksck_remote-test.cc M src/kudu/tools/ksck_remote.cc M src/kudu/tools/tool_action_cluster.cc 8 files changed, 306 insertions(+), 102 deletions(-) Approvals: Attila Bukor: Looks good to me, but someone else must approve Will Berkeley: Verified Andrew Wong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/9883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b Gerrit-Change-Number: 9883 Gerrit-PatchSet: 9 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] [tools] ksck improvements [3/n]: master consensus checks
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9883 ) Change subject: [tools] ksck improvements [3/n]: master consensus checks .. Patch Set 8: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/9883/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9883/7//COMMIT_MSG@13 PS7, Line 13: Here's a sample matrix when a master is down: > No this is what it looks like when a master is down. B = D, but we can't te Ah gotcha. -- To view, visit http://gerrit.cloudera.org:8080/9883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b Gerrit-Change-Number: 9883 Gerrit-PatchSet: 8 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 10 Apr 2018 19:37:07 + Gerrit-HasComments: Yes
[kudu-CR] [tools] ksck improvements [3/n]: master consensus checks
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9883 ) Change subject: [tools] ksck improvements [3/n]: master consensus checks .. Patch Set 8: Verified+1 (4 comments) IWYU's suggestion is wrong- implementing it breaks the build http://gerrit.cloudera.org:8080/#/c/9883/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9883/7//COMMIT_MSG@13 PS7, Line 13: Here's a sample matrix when a master is down: > nit: this isn't just "a master is down", right? It's the wrong uuid case? No this is what it looks like when a master is down. B = D, but we can't tell that if we can't get a UUID from B. In the tablet server case we have a uuid for B from the master. http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc File src/kudu/tools/ksck.cc: http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc@293 PS5, Line 293: > I agree this would make the code a little clearer but I think we should lea Refactored so the uuid of a master with unknown uuid is always kDummyUuid and then the address in parens, so this method goes away. http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc@461 PS5, Line 461: : return table.PrintTo(out); > The scripts should be easy to rewrite though. To keep backwards compatibili Done http://gerrit.cloudera.org:8080/#/c/9883/6/src/kudu/tools/ksck.cc File src/kudu/tools/ksck.cc: http://gerrit.cloudera.org:8080/#/c/9883/6/src/kudu/tools/ksck.cc@150 PS6, Line 150: for (const auto& pb : peers) { > these seem to be unused now Done -- To view, visit http://gerrit.cloudera.org:8080/9883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b Gerrit-Change-Number: 9883 Gerrit-PatchSet: 8 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 10 Apr 2018 19:20:57 + Gerrit-HasComments: Yes
[kudu-CR] [tools] ksck improvements [3/n]: master consensus checks
Will Berkeley has removed Kudu Jenkins from this change. ( http://gerrit.cloudera.org:8080/9883 ) Change subject: [tools] ksck improvements [3/n]: master consensus checks .. Removed reviewer Kudu Jenkins with the following votes: * Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/9883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b Gerrit-Change-Number: 9883 Gerrit-PatchSet: 8 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] [tools] ksck improvements [3/n]: master consensus checks
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/9883 ) Change subject: [tools] ksck improvements [3/n]: master consensus checks .. Patch Set 8: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/9883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b Gerrit-Change-Number: 9883 Gerrit-PatchSet: 8 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 10 Apr 2018 18:46:35 + Gerrit-HasComments: No
[kudu-CR] [tools] ksck improvements [3/n]: master consensus checks
Hello Mike Percy, Attila Bukor, Kudu Jenkins, Andrew Wong, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9883 to look at the new patch set (#8). Change subject: [tools] ksck improvements [3/n]: master consensus checks .. [tools] ksck improvements [3/n]: master consensus checks This patch adds consensus consistency checks and a consensus matrix for the master tablet. It's a little trickier than for tablets, because there's no uuid available for an unavailable master. Here's a sample matrix when a master is down: WARNING: masters have consensus conflicts All reported masters are: A = c9afe67944784c70ab13989354c10f9e B = (localhost:7052) C = 488cf2ab551b422aa8c683a054cdcde3 D = 62dc9b532b084a459130f1676d8e1998 Config source |Replicas| Current term | Config index | Committed? ---++--+--+ A | A C* D | 4| -1 | Yes B | [config not available] | | | C | A C* D | 4| -1 | Yes Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b --- M src/kudu/integration-tests/cluster_verifier.cc M src/kudu/tools/CMakeLists.txt M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/ksck.h M src/kudu/tools/ksck_remote-test.cc M src/kudu/tools/ksck_remote.cc M src/kudu/tools/tool_action_cluster.cc 8 files changed, 306 insertions(+), 102 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/9883/8 -- To view, visit http://gerrit.cloudera.org:8080/9883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b Gerrit-Change-Number: 9883 Gerrit-PatchSet: 8 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] [tools] ksck improvements [3/n]: master consensus checks
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9883 ) Change subject: [tools] ksck improvements [3/n]: master consensus checks .. Patch Set 7: Code-Review+1 (1 comment) Also some IWYU http://gerrit.cloudera.org:8080/#/c/9883/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9883/7//COMMIT_MSG@13 PS7, Line 13: Here's a sample matrix when a master is down: nit: this isn't just "a master is down", right? It's the wrong uuid case? -- To view, visit http://gerrit.cloudera.org:8080/9883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b Gerrit-Change-Number: 9883 Gerrit-PatchSet: 7 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 10 Apr 2018 17:17:57 + Gerrit-HasComments: Yes
[kudu-CR] [tools] ksck improvements [3/n]: master consensus checks
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/9883 ) Change subject: [tools] ksck improvements [3/n]: master consensus checks .. Patch Set 7: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/9883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b Gerrit-Change-Number: 9883 Gerrit-PatchSet: 7 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 10 Apr 2018 16:27:32 + Gerrit-HasComments: No
[kudu-CR] [tools] ksck improvements [3/n]: master consensus checks
Hello Mike Percy, Attila Bukor, Kudu Jenkins, Andrew Wong, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9883 to look at the new patch set (#7). Change subject: [tools] ksck improvements [3/n]: master consensus checks .. [tools] ksck improvements [3/n]: master consensus checks This patch adds consensus consistency checks and a consensus matrix for the master tablet. It's a little trickier than for tablets, because there's no uuid available for an unavailable master. Here's a sample matrix when a master is down: WARNING: masters have consensus conflicts All reported masters are: A = c9afe67944784c70ab13989354c10f9e B = (localhost:7052) C = 488cf2ab551b422aa8c683a054cdcde3 D = 62dc9b532b084a459130f1676d8e1998 Config source |Replicas| Current term | Config index | Committed? ---++--+--+ A | A C* D | 4| -1 | Yes B | [config not available] | | | C | A C* D | 4| -1 | Yes Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b --- M src/kudu/integration-tests/cluster_verifier.cc M src/kudu/tools/CMakeLists.txt M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/ksck.h M src/kudu/tools/ksck_remote-test.cc M src/kudu/tools/ksck_remote.cc M src/kudu/tools/tool_action_cluster.cc 8 files changed, 307 insertions(+), 103 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/9883/7 -- To view, visit http://gerrit.cloudera.org:8080/9883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b Gerrit-Change-Number: 9883 Gerrit-PatchSet: 7 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] [tools] ksck improvements [3/n]: master consensus checks
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/9883 ) Change subject: [tools] ksck improvements [3/n]: master consensus checks .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/9883/6/src/kudu/tools/ksck.cc File src/kudu/tools/ksck.cc: http://gerrit.cloudera.org:8080/#/c/9883/6/src/kudu/tools/ksck.cc@150 PS6, Line 150: vector voter_uuids; these seem to be unused now -- To view, visit http://gerrit.cloudera.org:8080/9883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b Gerrit-Change-Number: 9883 Gerrit-PatchSet: 6 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 10 Apr 2018 10:01:41 + Gerrit-HasComments: Yes
[kudu-CR] [tools] ksck improvements [3/n]: master consensus checks
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/9883 ) Change subject: [tools] ksck improvements [3/n]: master consensus checks .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc File src/kudu/tools/ksck.cc: http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc@134 PS5, Line 134: ksck_cstate->term = cstate.current_term(); > I wouldn't be able to declare config as const in that case :( it would've just been easier to parse visaully, I didn't think about config being const. Let's leave this the way it is. http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc@461 PS5, Line 461: we do not know the uuid in that case. : std::sort(summaries.begin(), summaries.e > Ah this sort of thing might be nice but I think it'd mess up some scripts w The scripts should be easy to rewrite though. To keep backwards compatibility, maybe there should be a flag to include RPC address in ksck output. -- To view, visit http://gerrit.cloudera.org:8080/9883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b Gerrit-Change-Number: 9883 Gerrit-PatchSet: 6 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 10 Apr 2018 09:59:41 + Gerrit-HasComments: Yes
[kudu-CR] [tools] ksck improvements [3/n]: master consensus checks
Hello Mike Percy, Attila Bukor, Andrew Wong, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9883 to look at the new patch set (#6). Change subject: [tools] ksck improvements [3/n]: master consensus checks .. [tools] ksck improvements [3/n]: master consensus checks This patch adds consensus consistency checks and a consensus matrix for the master tablet. It's a little trickier than for tablets, because there's no uuid available for an unavailable master. Here's a sample matrix when a master is down: WARNING: masters have consensus conflicts All reported masters are: A = c9afe67944784c70ab13989354c10f9e B = (localhost:7052) C = 488cf2ab551b422aa8c683a054cdcde3 D = 62dc9b532b084a459130f1676d8e1998 Config source |Replicas| Current term | Config index | Committed? ---++--+--+ A | A C* D | 4| -1 | Yes B | [config not available] | | | C | A C* D | 4| -1 | Yes Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b --- M src/kudu/integration-tests/cluster_verifier.cc M src/kudu/tools/CMakeLists.txt M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/ksck.h M src/kudu/tools/ksck_remote-test.cc M src/kudu/tools/ksck_remote.cc M src/kudu/tools/tool_action_cluster.cc 8 files changed, 311 insertions(+), 91 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/9883/6 -- To view, visit http://gerrit.cloudera.org:8080/9883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b Gerrit-Change-Number: 9883 Gerrit-PatchSet: 6 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] [tools] ksck improvements [3/n]: master consensus checks
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9883 ) Change subject: [tools] ksck improvements [3/n]: master consensus checks .. Patch Set 5: -Verified (11 comments) http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck-test.cc File src/kudu/tools/ksck-test.cc: http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck-test.cc@409 PS5, Line 409: : TEST_F(KsckTest, TestWrongMasterUuid) { > nit: maybe add a comment explaining an example of when we might expect to g Done http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck-test.cc@454 PS5, Line 454: " Config source | Replicas | Current term | Config index | Committed?\n" : "---+--+--+--+\n" : " A | A* B C| 0| | Yes\n" : " B | A B* C| 0| | Yes\n" : " C | A* B C| 0| | Yes\n"); > nit: Do you think it's worth adding any additional info here, like "Expecte I think that's clear enough from the consensus matrix. Pointing it out especially doesn't really change or help anything, I don't think. http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.h File src/kudu/tools/ksck.h: http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.h@121 PS5, Line 121: KsckConsensusState() = default; > Is this needed so we can wrap boost::optional<>? No, it's needed because there's also an explicit constructor, so the compiler won't implicitly implement a 0-arg constructor. http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.h@305 PS5, Line 305: boost::optional cstate_; > nit: doc when might this be none? Done http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc File src/kudu/tools/ksck.cc: http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc@102 PS5, Line 102: // Format replicas known and unknown to 'config' using labels from 'uuid_mapping'. > nit: Hmm maybe I'll be able to parse this better after reading through the Done http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc@107 PS5, Line 107: replicas > nit: similarly, maybe label_and_replica_pair or something? Or if this is a Howabout labeled_replica as a name for the pair? Making a struct out of it is appealing but I think the scope of usage is small enough it's easier just to use a std::pair. http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc@134 PS5, Line 134: ksck_cstate->type = cstate.has_pending_config() ? > shouldn't these two be grouped under a single I wouldn't be able to declare config as const in that case :( Beyond that, doesn't seem to matter much which way it's done. If you really have a strong pref I can change it. http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc@143 PS5, Line 143: !cstate.leader_uuid().empty() > Is this different than cstate.has_leader_uuid()? When might the cstate have If memory serves, the reason it's like this is because Kudu servers set leader_uuid to "" explicitly when there's no leader, so we test for emptiness instead of non-presence. I added a comment explaining this. http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc@147 PS5, Line 147: vector voter_uuids; : vector non_voter_uuids; > Why this extra buffering instead of inserting directly into the sets? Good catch. Not sure. Maybe it was part of a function that could fail in the past? Changed it. http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc@293 PS5, Line 293: uuid > I agree and based on the usages below I believe this method could return uu I agree this would make the code a little clearer but I think we should leave it for the moment. See my response to one of Andrew's comment below. http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc@461 PS5, Line 461: The address is used in the sort for the unavailable master case, because : // we do not know the uuid in that case. > How feasible would it be to assign the "kDummyId (hostport)" string as the Ah this sort of thing might be nice but I think it'd mess up some scripts we commonly use to fix broken tablets :(. The n in [3/n], [4/n],... will eventually get big enough to see a patch where ksck can auto-repair these common broken situations so we can change the output more freely. Keep in mind also that, in both the master and tserver case, the addresses of the nodes are listed just above the consensus matrix part, so we'd be repeating ourselves a bit. -- To view, visit http://gerrit.cloudera.org:8080/9883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I10
[kudu-CR] [tools] ksck improvements [3/n]: master consensus checks
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/9883 ) Change subject: [tools] ksck improvements [3/n]: master consensus checks .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc File src/kudu/tools/ksck.cc: http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc@134 PS5, Line 134: ksck_cstate->type = cstate.has_pending_config() ? shouldn't these two be grouped under a single if (cstate.has_pending_config()) { } else { } block? http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc@293 PS5, Line 293: uuid > If we don't already have this listed somewhere, maybe adding `master->addre I agree and based on the usages below I believe this method could return uuid + master address in every case. -- To view, visit http://gerrit.cloudera.org:8080/9883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b Gerrit-Change-Number: 9883 Gerrit-PatchSet: 5 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 09 Apr 2018 20:41:00 + Gerrit-HasComments: Yes
[kudu-CR] [tools] ksck improvements [3/n]: master consensus checks
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9883 ) Change subject: [tools] ksck improvements [3/n]: master consensus checks .. Patch Set 5: (10 comments) http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck-test.cc File src/kudu/tools/ksck-test.cc: http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck-test.cc@409 PS5, Line 409: : TEST_F(KsckTest, TestWrongMasterUuid) { nit: maybe add a comment explaining an example of when we might expect to get into this state IIUC, the operator specified masters A, B, C, but the actual configs point to A, B, D. http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck-test.cc@454 PS5, Line 454: " Config source | Replicas | Current term | Config index | Committed?\n" : "---+--+--+--+\n" : " A | A* B C| 0| | Yes\n" : " B | A B* C| 0| | Yes\n" : " C | A* B C| 0| | Yes\n"); nit: Do you think it's worth adding any additional info here, like "Expected a single leader but got multiple"? http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.h File src/kudu/tools/ksck.h: http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.h@121 PS5, Line 121: KsckConsensusState() = default; Is this needed so we can wrap boost::optional<>? http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.h@305 PS5, Line 305: boost::optional cstate_; nit: doc when might this be none? http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc File src/kudu/tools/ksck.cc: http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc@102 PS5, Line 102: // Format replicas known and unknown to 'config' using labels from 'uuid_mapping'. nit: Hmm maybe I'll be able to parse this better after reading through the rest of the patch, but right now, I'm finding this a little confusing. "Return a formatted stringified version of 'config', mapping UUIDs to single-character labels with the mapping 'uuid_mapping'." nit: Also not sure if this is overkill but maybe rename 'uuid_mapping' to 'label_by_uuid' or something? Ah I now see that you just moved this from down below, so feel doubly inclined to disregard if you so please. http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc@107 PS5, Line 107: replicas nit: similarly, maybe label_and_replica_pair or something? Or if this is a common pattern, maybe consider: struct label_and_pair { char label; string replica_uuid; } ...so that you can refer to them below as e.replica_uuid and e.label instead of first/second. Although I don't feel super strongly about either name change so feel free to ignore. Ah I now see that you just moved this from down below, so feel doubly inclined to disregard if you so please. http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc@143 PS5, Line 143: !cstate.leader_uuid().empty() Is this different than cstate.has_leader_uuid()? When might the cstate have an empty string as a leader_uuid? http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc@147 PS5, Line 147: vector voter_uuids; : vector non_voter_uuids; Why this extra buffering instead of inserting directly into the sets? http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc@293 PS5, Line 293: uuid If we don't already have this listed somewhere, maybe adding `master->address()` would be beneficial for known UUIDs as well. At least basing off of the sample output in the commit message, it'd probably be helpful to list the UUID --> hostport mapping if we don't. Although since you're keying on it below, maybe the extra labeling belongs somewhere else. http://gerrit.cloudera.org:8080/#/c/9883/5/src/kudu/tools/ksck.cc@461 PS5, Line 461: The address is used in the sort for the unavailable master case, because : // we do not know the uuid in that case. How feasible would it be to assign the "kDummyId (hostport)" string as the UUID for unavailable masters? With that, I think we'd be able to keep this block of code the same and avoid an extra step in CheckMasterConsensus(). -- To view, visit http://gerrit.cloudera.org:8080/9883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b Gerrit-Change-Number: 9883 Gerrit-PatchSet: 5 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 09 Apr 2018 19:04:44 + Gerrit-HasComments: Ye
[kudu-CR] [tools] ksck improvements [3/n]: master consensus checks
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9883 ) Change subject: [tools] ksck improvements [3/n]: master consensus checks .. Patch Set 5: Verified+1 Raft consensus itest flake :( -- To view, visit http://gerrit.cloudera.org:8080/9883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b Gerrit-Change-Number: 9883 Gerrit-PatchSet: 5 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Thu, 05 Apr 2018 22:25:50 + Gerrit-HasComments: No
[kudu-CR] [tools] ksck improvements [3/n]: master consensus checks
Will Berkeley has removed Kudu Jenkins from this change. ( http://gerrit.cloudera.org:8080/9883 ) Change subject: [tools] ksck improvements [3/n]: master consensus checks .. Removed reviewer Kudu Jenkins with the following votes: * Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/9883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b Gerrit-Change-Number: 9883 Gerrit-PatchSet: 5 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] [tools] ksck improvements [3/n]: master consensus checks
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9883 to look at the new patch set (#4). Change subject: [tools] ksck improvements [3/n]: master consensus checks .. [tools] ksck improvements [3/n]: master consensus checks This patch adds consensus consistency checks and a consensus matrix for the master tablet. It's a little trickier than for tablets, because there's no uuid available for an unavailable master. Here's a sample matrix when a master is down: WARNING: masters have consensus conflicts All reported masters are: A = c9afe67944784c70ab13989354c10f9e B = (localhost:7052) C = 488cf2ab551b422aa8c683a054cdcde3 D = 62dc9b532b084a459130f1676d8e1998 Config source |Replicas| Current term | Config index | Committed? ---++--+--+ A | A C* D | 4| -1 | Yes B | [config not available] | | | C | A C* D | 4| -1 | Yes Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b --- M src/kudu/integration-tests/cluster_verifier.cc M src/kudu/tools/CMakeLists.txt M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/ksck.h M src/kudu/tools/ksck_remote-test.cc M src/kudu/tools/ksck_remote.cc M src/kudu/tools/tool_action_cluster.cc 8 files changed, 307 insertions(+), 91 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/9883/4 -- To view, visit http://gerrit.cloudera.org:8080/9883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b Gerrit-Change-Number: 9883 Gerrit-PatchSet: 4 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley
[kudu-CR] [tools] ksck improvements [3/n]: master consensus checks
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9883 ) Change subject: [tools] ksck improvements [3/n]: master consensus checks .. Patch Set 3: Precommit failure was flake in TabletCopyServiceTest.TestSimpleBeginEndSession due to Jenkins clock sync issues. -- To view, visit http://gerrit.cloudera.org:8080/9883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b Gerrit-Change-Number: 9883 Gerrit-PatchSet: 3 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 02 Apr 2018 21:36:59 + Gerrit-HasComments: No
[kudu-CR] [tools] ksck improvements [3/n]: master consensus checks
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9883 to look at the new patch set (#3). Change subject: [tools] ksck improvements [3/n]: master consensus checks .. [tools] ksck improvements [3/n]: master consensus checks This patch adds consensus consistency checks and a consensus matrix for the master tablet. It's a little trickier than for tablets, because there's no uuid available for an unavailable master. Here's a sample matrix when a master is down: WARNING: masters have consensus conflicts All reported masters are: A = c9afe67944784c70ab13989354c10f9e B = (localhost:7052) C = 488cf2ab551b422aa8c683a054cdcde3 D = 62dc9b532b084a459130f1676d8e1998 Config source |Replicas| Current term | Config index | Committed? ---++--+--+ A | A C* D | 4| -1 | Yes B | [config not available] | | | C | A C* D | 4| -1 | Yes Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b --- M src/kudu/integration-tests/cluster_verifier.cc M src/kudu/tools/CMakeLists.txt M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/ksck.h M src/kudu/tools/ksck_remote-test.cc M src/kudu/tools/ksck_remote.cc M src/kudu/tools/tool_action_cluster.cc 8 files changed, 307 insertions(+), 90 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/9883/3 -- To view, visit http://gerrit.cloudera.org:8080/9883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b Gerrit-Change-Number: 9883 Gerrit-PatchSet: 3 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] [tools] ksck improvements [3/n]: master consensus checks
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9883 to look at the new patch set (#2). Change subject: [tools] ksck improvements [3/n]: master consensus checks .. [tools] ksck improvements [3/n]: master consensus checks This patch adds consensus consistency checks and a consensus matrix for the master tablet. It's a little trickier than for tablets, because there's no uuid available for an unavailable master. Here's a sample matrix when a master is down: WARNING: masters have consensus conflicts All reported masters are: A = c9afe67944784c70ab13989354c10f9e B = (localhost:7052) C = 488cf2ab551b422aa8c683a054cdcde3 D = 62dc9b532b084a459130f1676d8e1998 Config source |Replicas| Current term | Config index | Committed? ---++--+--+ A | A C* D | 4| -1 | Yes B | [config not available] | | | C | A C* D | 4| -1 | Yes Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b --- M src/kudu/integration-tests/cluster_verifier.cc M src/kudu/tools/CMakeLists.txt M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/ksck.h M src/kudu/tools/ksck_remote-test.cc M src/kudu/tools/ksck_remote.cc M src/kudu/tools/tool_action_cluster.cc 8 files changed, 308 insertions(+), 90 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/9883/2 -- To view, visit http://gerrit.cloudera.org:8080/9883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b Gerrit-Change-Number: 9883 Gerrit-PatchSet: 2 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] [tools] ksck improvements [3/n]: master consensus checks
Will Berkeley has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9883 Change subject: [tools] ksck improvements [3/n]: master consensus checks .. [tools] ksck improvements [3/n]: master consensus checks This patch adds consensus consistency checks and a consensus matrix for the master tablet. It's a little trickier than for tablets, because there's no uuid available for an unavailable master. Here's a sample matrix when a master is down: WARNING: masters have consensus conflicts All reported masters are: A = c9afe67944784c70ab13989354c10f9e B = (localhost:7052) C = 488cf2ab551b422aa8c683a054cdcde3 D = 62dc9b532b084a459130f1676d8e1998 Config source |Replicas| Current term | Config index | Committed? ---++--+--+ A | A C* D | 4| -1 | Yes B | [config not available] | | | C | A C* D | 4| -1 | Yes Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b --- M src/kudu/integration-tests/cluster_verifier.cc M src/kudu/tools/CMakeLists.txt M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/ksck.h M src/kudu/tools/ksck_remote-test.cc M src/kudu/tools/ksck_remote.cc M src/kudu/tools/tool_action_cluster.cc 8 files changed, 302 insertions(+), 90 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/9883/1 -- To view, visit http://gerrit.cloudera.org:8080/9883 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I1015854759debb3f1acf4bf9eb143260547f4a4b Gerrit-Change-Number: 9883 Gerrit-PatchSet: 1 Gerrit-Owner: Will Berkeley