Hello Alexey Serbin, Attila Bukor, Kudu Jenkins, Andrew Wong, Todd Lipcon,
I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10061 to look at the new patch set (#7). Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags ...................................................................... [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags This adds checks to ksck that look for experimental, unsafe, and hidden flags set to non-default values on Kudu masters and tablet servers. If any are found, ksck generates a table summarizing the different flags and their values. For example: Flag | Value | Tags | Master ------------------------+---------------------+----------------------+------------------------------- codegen_dump_functions | true | runtime,experimental | localhost:7052,localhost:7053 min_compression_ratio | 0.80000000000000004 | experimental | all 3 server(s) checked safe_time_max_lag_ms | 40000 | experimental | localhost:7052 safe_time_max_lag_ms | 50000 | experimental | localhost:7053 The table has one row for each unique (flag, value) pair, listing all daemons with --flag=value. So, in the above output, there are two rows for the flag --safe_time_max_lag_ms because it's set to two different values on two masters. This makes it easy to scan for concerning flags and their values. Since the output might not scale to a large number of servers, the CSV of servers is abbreviated, by default, to 3 entries, with the number of additional servers indicated. The number of entries before truncation kicks in is controlled by --truncate_server_csv_length. Additionally, if all checked servers have an unusual --flag=value we call that out specially. For example, the above table reprinted with --truncate_server_csv_length=2 would look like Flag | Value | Tags | Master ------------------------+---------------------+----------------------+-------------------------------------- codegen_dump_functions | true | runtime,experimental | localhost:7052 and 1 other server(s) min_compression_ratio | 0.80000000000000004 | experimental | all 3 server(s) checked safe_time_max_lag_ms | 40000 | experimental | localhost:7052 safe_time_max_lag_ms | 50000 | experimental | localhost:7053 assuming that there are 3 servers checked in total. Having unusual flags or failing to gather flags isn't considered an error, since it doesn't indicate the cluster is unhealthy (in the latter case because the daemon may not support the GetFlags RPC). Instead, flag checks surface their results in a new warnings section near the end of the ksck output. The new warnings section looks like this in context: ================== Warnings: ================== Some masters have unsafe, experimental, or hidden flags set unable to get flag information for tablet server 812db6461bae4f62a651e132f783ab53 (127.0.0.1:7250): could not get status from server: Client connection negotiation failed: client connection to 127.0.0.1:7250: connect: Connection refused (error 61) Some tablet servers have unsafe, experimental, or hidden flags set tserver flag check error: 1 of 3 tservers' flags were not available ================== Errors: ================== Network error: error fetching info from tablet servers: failed to gather info for all tablet servers: 1 of 3 had errors FAILED Runtime error: ksck discovered errors Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706 --- M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/ksck.h M src/kudu/tools/ksck_remote-test.cc M src/kudu/tools/ksck_remote.cc M src/kudu/tools/ksck_remote.h M src/kudu/tools/ksck_results.cc M src/kudu/tools/ksck_results.h 8 files changed, 458 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/10061/7 -- To view, visit http://gerrit.cloudera.org:8080/10061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706 Gerrit-Change-Number: 10061 Gerrit-PatchSet: 7 Gerrit-Owner: Will Berkeley <wdberke...@gmail.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Attila Bukor <abu...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>