Hello Alexey Serbin, Attila Bukor, Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10061

to look at the new patch set (#6).

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, 
unsafe, hidden flags
......................................................................

[tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

This adds checks to ksck that look for experimental, unsafe, and hidden
flags set to non-default values on Kudu masters and tablet servers. If
any are found, ksck generates a table summarizing the different flags and
their values. For example:

          Flag          |        Value        |         Tags         |          
  Master
------------------------+---------------------+----------------------+-------------------------------
 codegen_dump_functions | true                | runtime,experimental | 
localhost:7052,localhost:7053
 min_compression_ratio  | 0.80000000000000004 | experimental         | all 3 
server(s) checked
 safe_time_max_lag_ms   | 40000               | experimental         | 
localhost:7052
 safe_time_max_lag_ms   | 50000               | experimental         | 
localhost:7053

The table has one row for each unique (flag, value) pair, listing all
daemons with --flag=value. So, in the above output, there are two rows
for the flag --safe_time_max_lag_ms  because it's set to two different
values on two masters. This makes it easy to scan for concerning flags
and their values.

Since the output might not scale to a large number of
servers, the CSV of servers is abbreviated, by default, to 3 entries,
with the number of additional servers indicated. The number of entries
before truncation kicks in is controlled by --truncate_server_csv_length.
Additionally, if all checked servers have an unusual --flag=value we call
that out specially. For example, the above table reprinted with
--truncate_server_csv_length=2 would look like

          Flag          |        Value        |         Tags         |          
   Master
------------------------+---------------------+----------------------+--------------------------------------
 codegen_dump_functions | true                | runtime,experimental | 
localhost:7052 and 1 other server(s)
 min_compression_ratio  | 0.80000000000000004 | experimental         | all 3 
server(s) checked
 safe_time_max_lag_ms   | 40000               | experimental         | 
localhost:7052
 safe_time_max_lag_ms   | 50000               | experimental         | 
localhost:7053

assuming that there are 3 servers checked in total.

Having unusual flags or failing to gather flags isn't considered an
error, since it doesn't indicate the cluster is unhealthy (in the latter
case because the daemon may not support the GetFlags RPC). Instead,
flag checks surface their results in a new warnings section near the
end of the ksck output.

The new warnings section looks like this in context:

==================
Warnings:
==================
Some masters have unsafe, experimental, or hidden flags set
unable to get flag information for tablet server 
812db6461bae4f62a651e132f783ab53 (127.0.0.1:7250): could not get status from 
server: Client connection negotiation failed: client connection to 
127.0.0.1:7250: connect: Connection refused (error 61)
Some tablet servers have unsafe, experimental, or hidden flags set
tserver flag check error: 1 of 3 tservers' flags were not available

==================
Errors:
==================
Network error: error fetching info from tablet servers: failed to gather info 
for all tablet servers: 1 of 3 had errors

FAILED
Runtime error: ksck discovered errors

Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
---
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_remote-test.cc
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
M src/kudu/tools/ksck_results.cc
M src/kudu/tools/ksck_results.h
8 files changed, 457 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/10061/6
--
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 6
Gerrit-Owner: Will Berkeley <wdberke...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <abu...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>

Reply via email to