[kudu-CR] KUDU-2129 make ksck less scary when copying
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9528 ) Change subject: KUDU-2129 make ksck less scary when copying .. KUDU-2129 make ksck less scary when copying This patch changes some of the outputs for ksck to be less troubling when the cluster is recovering. Updates include: - Report tablets with replicas whose data states are TABLET_DATA_COPYING as RECOVERING instead of UNDER_REPLICATED. - Report tables with recovering tablets as RECOVERING instead of UNDER_REPLICATED. - Report replicas that aren't running as "not running" instead of "bad state". Most non-RUNNING states have a time and place during a healthy cluster's lifespan. - Don't log the error type with the errors when running the `ksck` tool. Our messages are generally good enough, and there isn't a perfect mapping to some errors. E.g. we should take the "Corruption" out of "Corruption: 1 out of 1 table(s) are bad" - Report "not healthy" instead of "bad" in logs like the one above Change-Id: Iba0c1f5b5a7083cbef99d3674dfebe1457075ce0 Reviewed-on: http://gerrit.cloudera.org:8080/9528 Tested-by: Kudu Jenkins Reviewed-by: Will Berkeley--- M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/ksck.h M src/kudu/tools/tool_action_cluster.cc 4 files changed, 189 insertions(+), 90 deletions(-) Approvals: Kudu Jenkins: Verified Will Berkeley: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/9528 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iba0c1f5b5a7083cbef99d3674dfebe1457075ce0 Gerrit-Change-Number: 9528 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2129 make ksck less scary when copying
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9528 ) Change subject: KUDU-2129 make ksck less scary when copying .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/9528 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iba0c1f5b5a7083cbef99d3674dfebe1457075ce0 Gerrit-Change-Number: 9528 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Thu, 08 Mar 2018 22:18:59 + Gerrit-HasComments: No
[kudu-CR] KUDU-2129 make ksck less scary when copying
Hello Will Berkeley, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9528 to look at the new patch set (#3). Change subject: KUDU-2129 make ksck less scary when copying .. KUDU-2129 make ksck less scary when copying This patch changes some of the outputs for ksck to be less troubling when the cluster is recovering. Updates include: - Report tablets with replicas whose data states are TABLET_DATA_COPYING as RECOVERING instead of UNDER_REPLICATED. - Report tables with recovering tablets as RECOVERING instead of UNDER_REPLICATED. - Report replicas that aren't running as "not running" instead of "bad state". Most non-RUNNING states have a time and place during a healthy cluster's lifespan. - Don't log the error type with the errors when running the `ksck` tool. Our messages are generally good enough, and there isn't a perfect mapping to some errors. E.g. we should take the "Corruption" out of "Corruption: 1 out of 1 table(s) are bad" - Report "not healthy" instead of "bad" in logs like the one above Change-Id: Iba0c1f5b5a7083cbef99d3674dfebe1457075ce0 --- M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/ksck.h M src/kudu/tools/tool_action_cluster.cc 4 files changed, 189 insertions(+), 90 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/9528/3 -- To view, visit http://gerrit.cloudera.org:8080/9528 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iba0c1f5b5a7083cbef99d3674dfebe1457075ce0 Gerrit-Change-Number: 9528 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2129 make ksck less scary when copying
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9528 ) Change subject: KUDU-2129 make ksck less scary when copying .. Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/9528/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9528/1//COMMIT_MSG@25 PS1, Line 25: > This is the one change I have an issue with because it makes it less clear Yeah, I think essentially we want to convey the messages: - The table is good, DON'T WORRY ABOUT IT - The table is recovering, KEEP AN EYE ON IT - The table is under-replicated/unavailable, DO SOMETHING ABOUT IT I'll go with "not healthy" for now, hopefully its proximity with the table summary will be enough to connect some dots. http://gerrit.cloudera.org:8080/#/c/9528/1/src/kudu/tools/ksck-test.cc File src/kudu/tools/ksck-test.cc: http://gerrit.cloudera.org:8080/#/c/9528/1/src/kudu/tools/ksck-test.cc@322 PS1, Line 322: > Howabout "are not healthy"? Sure, I wanted to avoid being so direct because this could easily be seen as "the table is unhealthy", but I see what you mean. http://gerrit.cloudera.org:8080/#/c/9528/1/src/kudu/tools/ksck-test.cc@402 PS1, Line 402: RT_OK(ksck_->ChecksumData(ChecksumOptions())); : ASSERT_STR_CONTAINS(err_stream_.str(), : "0/9 replicas remaining (180B from disk, 90 rows summed)"); : > Are these the long lines you're complaining about? Maybe we can add a helpe Yeah, I'm going to just reuse the printing code we have in the Ksck class. This ended up not being as pretty as I'd hoped because TableSummary is private, and I'd rather not make it public for the sake of tests. Alternatively, I could FRIEND_TEST everything. http://gerrit.cloudera.org:8080/#/c/9528/1/src/kudu/tools/ksck-test.cc@420 PS1, Line 420: RT_STR_CONTAINS(err_stream_.str(), ExpectedKsckTableSummary("test", : /*healthy_tables=*/ 3, : /*recovering_tables=*/ 0, : /*underreplicated_tables=*/ 0, : /*consensus_mismatch_tables=*/ 0, : /*unavailable_tables=*/ 0)); : } > On the other hand I think the consensus matrix literals are kind of a featu Yeah, I think the cmatrix stuff is fine, this patch doesn't really affect it and I agree it's helpful to have the full output. http://gerrit.cloudera.org:8080/#/c/9528/1/src/kudu/tools/ksck.h File src/kudu/tools/ksck.h: http://gerrit.cloudera.org:8080/#/c/9528/1/src/kudu/tools/ksck.h@461 PS1, Line 461: // The tablet is healthy. > Can you add a blurb like this for all non-OK statuses? Maybe like Done http://gerrit.cloudera.org:8080/#/c/9528/1/src/kudu/tools/ksck.h@489 PS1, Line 489: t TotalTablets() const { : return healthy_tablets + recovering_tablets + underreplicated_tablets + : consensus_mismatch_tablets + unavailable_tablets; : } : : // > Isn't under-replicated worse than recovering? Ah, missed that above comment. -- To view, visit http://gerrit.cloudera.org:8080/9528 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iba0c1f5b5a7083cbef99d3674dfebe1457075ce0 Gerrit-Change-Number: 9528 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Wed, 07 Mar 2018 21:57:06 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2129 make ksck less scary when copying
Hello Will Berkeley, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9528 to look at the new patch set (#2). Change subject: KUDU-2129 make ksck less scary when copying .. KUDU-2129 make ksck less scary when copying This patch changes some of the outputs for ksck to be less troubling when the cluster is recovering. Updates include: - Report tablets with replicas whose data states are TABLET_DATA_COPYING as RECOVERING instead of UNDER_REPLICATED. - Report tables with recovering tablets as RECOVERING instead of UNDER_REPLICATED. - Report replicas that aren't running as "not running" instead of "bad state". Most non-RUNNING states have a time and place during a healthy cluster's lifespan. - Don't log the error type with the errors when running the `ksck` tool. Our messages are generally good enough, and there isn't a perfect mapping to some errors. E.g. we should take the "Corruption" out of "Corruption: 1 out of 1 table(s) are bad" - Report "not healthy" instead of "bad" in logs like the one above Change-Id: Iba0c1f5b5a7083cbef99d3674dfebe1457075ce0 --- M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/ksck.h M src/kudu/tools/tool_action_cluster.cc 4 files changed, 189 insertions(+), 90 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/9528/2 -- To view, visit http://gerrit.cloudera.org:8080/9528 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iba0c1f5b5a7083cbef99d3674dfebe1457075ce0 Gerrit-Change-Number: 9528 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew WongGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley