Will Berkeley has submitted this change and it was merged. (
http://gerrit.cloudera.org:8080/10061 )
Change subject: [tools] ksck improvements [5/n]: Checks for experimental,
unsafe, hidden flags
......................................................................
[tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
This adds checks to ksck that look for experimental, unsafe, and hidden
flags set to non-default values on Kudu masters and tablet servers. If
any are found, ksck generates a table summarizing the different flags and
their values. For example:
Flag | Value | Tags |
Master
------------------------+---------------------+----------------------+-------------------------------
codegen_dump_functions | true | runtime,experimental |
localhost:7052,localhost:7053
min_compression_ratio | 0.80000000000000004 | experimental | all 3
server(s) checked
safe_time_max_lag_ms | 40000 | experimental |
localhost:7052
safe_time_max_lag_ms | 50000 | experimental |
localhost:7053
The table has one row for each unique (flag, value) pair, listing all
daemons with --flag=value. So, in the above output, there are two rows
for the flag --safe_time_max_lag_ms because it's set to two different
values on two masters. This makes it easy to scan for concerning flags
and their values.
Since the output might not scale to a large number of
servers, the CSV of servers is abbreviated, by default, to 3 entries,
with the number of additional servers indicated. The number of entries
before truncation kicks in is controlled by --truncate_server_csv_length.
Additionally, if all checked servers have an unusual --flag=value we call
that out specially. For example, the above table reprinted with
--truncate_server_csv_length=1 would look like
Flag | Value | Tags |
Master
------------------------+---------------------+----------------------+--------------------------------------
codegen_dump_functions | true | runtime,experimental |
localhost:7052 and 1 other server(s)
min_compression_ratio | 0.80000000000000004 | experimental | all 3
server(s) checked
safe_time_max_lag_ms | 40000 | experimental |
localhost:7052
safe_time_max_lag_ms | 50000 | experimental |
localhost:7053
assuming that there are 3 servers checked in total.
Having unusual flags or failing to gather flags isn't considered an
error, since it doesn't indicate the cluster is unhealthy (in the latter
case because the daemon may not support the GetFlags RPC). Instead,
flag checks surface their results in a new warnings section near the
end of the ksck output.
The new warnings section looks like this in context:
==================
Warnings:
==================
Some masters have unsafe, experimental, or hidden flags set
unable to get flag information for tablet server
812db6461bae4f62a651e132f783ab53 (127.0.0.1:7250): could not get status from
server: Client connection negotiation failed: client connection to
127.0.0.1:7250: connect: Connection refused (error 61)
Some tablet servers have unsafe, experimental, or hidden flags set
tserver flag check error: 1 of 3 tservers' flags were not available
==================
Errors:
==================
Network error: error fetching info from tablet servers: failed to gather info
for all tablet servers: 1 of 3 had errors
FAILED
Runtime error: ksck discovered errors
Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Reviewed-on: http://gerrit.cloudera.org:8080/10061
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong <[email protected]>
Reviewed-by: Attila Bukor <[email protected]>
---
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_remote-test.cc
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
M src/kudu/tools/ksck_results.cc
M src/kudu/tools/ksck_results.h
8 files changed, 459 insertions(+), 4 deletions(-)
Approvals:
Kudu Jenkins: Verified
Andrew Wong: Looks good to me, approved
Attila Bukor: Looks good to me, but someone else must approve
--
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 11
Gerrit-Owner: Will Berkeley <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>