[kudu-CR] KUDU-2129 make ksck less scary when copying

2018-03-08 Thread Andrew Wong (Code Review)
Andrew Wong has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9528 )

Change subject: KUDU-2129 make ksck less scary when copying
..

KUDU-2129 make ksck less scary when copying

This patch changes some of the outputs for ksck to be less troubling
when the cluster is recovering. Updates include:
- Report tablets with replicas whose data states are TABLET_DATA_COPYING
  as RECOVERING instead of UNDER_REPLICATED.
- Report tables with recovering tablets as RECOVERING instead of
  UNDER_REPLICATED.
- Report replicas that aren't running as "not running" instead of "bad
  state". Most non-RUNNING states have a time and place during a healthy
  cluster's lifespan.
- Don't log the error type with the errors when running the `ksck` tool.
  Our messages are generally good enough, and there isn't a perfect
  mapping to some errors. E.g. we should take the "Corruption" out of
  "Corruption: 1 out of 1 table(s) are bad"
- Report "not healthy" instead of "bad" in logs like the one above

Change-Id: Iba0c1f5b5a7083cbef99d3674dfebe1457075ce0
Reviewed-on: http://gerrit.cloudera.org:8080/9528
Tested-by: Kudu Jenkins
Reviewed-by: Will Berkeley 
---
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/tool_action_cluster.cc
4 files changed, 189 insertions(+), 90 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Will Berkeley: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/9528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iba0c1f5b5a7083cbef99d3674dfebe1457075ce0
Gerrit-Change-Number: 9528
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2129 make ksck less scary when copying

2018-03-08 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9528 )

Change subject: KUDU-2129 make ksck less scary when copying
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iba0c1f5b5a7083cbef99d3674dfebe1457075ce0
Gerrit-Change-Number: 9528
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Thu, 08 Mar 2018 22:18:59 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2129 make ksck less scary when copying

2018-03-07 Thread Andrew Wong (Code Review)
Hello Will Berkeley, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9528

to look at the new patch set (#3).

Change subject: KUDU-2129 make ksck less scary when copying
..

KUDU-2129 make ksck less scary when copying

This patch changes some of the outputs for ksck to be less troubling
when the cluster is recovering. Updates include:
- Report tablets with replicas whose data states are TABLET_DATA_COPYING
  as RECOVERING instead of UNDER_REPLICATED.
- Report tables with recovering tablets as RECOVERING instead of
  UNDER_REPLICATED.
- Report replicas that aren't running as "not running" instead of "bad
  state". Most non-RUNNING states have a time and place during a healthy
  cluster's lifespan.
- Don't log the error type with the errors when running the `ksck` tool.
  Our messages are generally good enough, and there isn't a perfect
  mapping to some errors. E.g. we should take the "Corruption" out of
  "Corruption: 1 out of 1 table(s) are bad"
- Report "not healthy" instead of "bad" in logs like the one above

Change-Id: Iba0c1f5b5a7083cbef99d3674dfebe1457075ce0
---
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/tool_action_cluster.cc
4 files changed, 189 insertions(+), 90 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/9528/3
--
To view, visit http://gerrit.cloudera.org:8080/9528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iba0c1f5b5a7083cbef99d3674dfebe1457075ce0
Gerrit-Change-Number: 9528
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2129 make ksck less scary when copying

2018-03-07 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9528 )

Change subject: KUDU-2129 make ksck less scary when copying
..


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/9528/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9528/1//COMMIT_MSG@25
PS1, Line 25:
> This is the one change I have an issue with because it makes it less clear
Yeah, I think essentially we want to convey the messages:
- The table is good, DON'T WORRY ABOUT IT
- The table is recovering, KEEP AN EYE ON IT
- The table is under-replicated/unavailable, DO SOMETHING ABOUT IT

I'll go with "not healthy" for now, hopefully its proximity with the table 
summary will be enough to connect some dots.


http://gerrit.cloudera.org:8080/#/c/9528/1/src/kudu/tools/ksck-test.cc
File src/kudu/tools/ksck-test.cc:

http://gerrit.cloudera.org:8080/#/c/9528/1/src/kudu/tools/ksck-test.cc@322
PS1, Line 322:
> Howabout "are not healthy"?
Sure, I wanted to avoid being so direct because this could easily be seen as 
"the table is unhealthy", but I see what you mean.


http://gerrit.cloudera.org:8080/#/c/9528/1/src/kudu/tools/ksck-test.cc@402
PS1, Line 402: RT_OK(ksck_->ChecksumData(ChecksumOptions()));
 :   ASSERT_STR_CONTAINS(err_stream_.str(),
 :   "0/9 replicas remaining (180B from disk, 
90 rows summed)");
 :
> Are these the long lines you're complaining about? Maybe we can add a helpe
Yeah, I'm going to  just reuse the printing code we have in the Ksck class.

This ended up not being as pretty as I'd hoped because TableSummary is private, 
and I'd rather not make it public for the sake of tests. Alternatively, I could 
FRIEND_TEST everything.


http://gerrit.cloudera.org:8080/#/c/9528/1/src/kudu/tools/ksck-test.cc@420
PS1, Line 420: RT_STR_CONTAINS(err_stream_.str(), 
ExpectedKsckTableSummary("test",
 :  
 /*healthy_tables=*/ 3,
 :  
 /*recovering_tables=*/ 0,
 :  
 /*underreplicated_tables=*/ 0,
 :  
 /*consensus_mismatch_tables=*/ 0,
 :  
 /*unavailable_tables=*/ 0));
 : }
> On the other hand I think the consensus matrix literals are kind of a featu
Yeah, I think the cmatrix stuff is fine, this patch doesn't really affect it 
and I agree it's helpful to have the full output.


http://gerrit.cloudera.org:8080/#/c/9528/1/src/kudu/tools/ksck.h
File src/kudu/tools/ksck.h:

http://gerrit.cloudera.org:8080/#/c/9528/1/src/kudu/tools/ksck.h@461
PS1, Line 461: // The tablet is healthy.
> Can you add a blurb like this for all non-OK statuses? Maybe like
Done


http://gerrit.cloudera.org:8080/#/c/9528/1/src/kudu/tools/ksck.h@489
PS1, Line 489: t TotalTablets() const {
 :   return healthy_tablets + recovering_tablets + 
underreplicated_tablets +
 :   consensus_mismatch_tablets + unavailable_tablets;
 : }
 :
 : //
> Isn't under-replicated worse than recovering?
Ah, missed that above comment.



--
To view, visit http://gerrit.cloudera.org:8080/9528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iba0c1f5b5a7083cbef99d3674dfebe1457075ce0
Gerrit-Change-Number: 9528
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Wed, 07 Mar 2018 21:57:06 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2129 make ksck less scary when copying

2018-03-07 Thread Andrew Wong (Code Review)
Hello Will Berkeley, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9528

to look at the new patch set (#2).

Change subject: KUDU-2129 make ksck less scary when copying
..

KUDU-2129 make ksck less scary when copying

This patch changes some of the outputs for ksck to be less troubling
when the cluster is recovering. Updates include:
- Report tablets with replicas whose data states are TABLET_DATA_COPYING
  as RECOVERING instead of UNDER_REPLICATED.
- Report tables with recovering tablets as RECOVERING instead of
  UNDER_REPLICATED.
- Report replicas that aren't running as "not running" instead of "bad
  state". Most non-RUNNING states have a time and place during a healthy
  cluster's lifespan.
- Don't log the error type with the errors when running the `ksck` tool.
  Our messages are generally good enough, and there isn't a perfect
  mapping to some errors. E.g. we should take the "Corruption" out of
  "Corruption: 1 out of 1 table(s) are bad"
- Report "not healthy" instead of "bad" in logs like the one above

Change-Id: Iba0c1f5b5a7083cbef99d3674dfebe1457075ce0
---
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/tool_action_cluster.cc
4 files changed, 189 insertions(+), 90 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/9528/2
--
To view, visit http://gerrit.cloudera.org:8080/9528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iba0c1f5b5a7083cbef99d3674dfebe1457075ce0
Gerrit-Change-Number: 9528
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley