Adar Dembo has posted comments on this change. Change subject: tools: add a tool to summarize data size on a tablet or tablets ......................................................................
Patch Set 2: (11 comments) http://gerrit.cloudera.org:8080/#/c/5727/2/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: Line 387: const vector<string> kLocalReplicaModeRegexes = { Update this? Line 1119: "KuduTableTestId | test-tablet | 0 | c10 (key) | \\S+\n", Why are these column IDs c10, c11, c12 and not c1, c2, c3? AFAICT the schema only has 3 columns. Line 1143: " KuduTableTestId | * | * | * | \\S+\n"}) { Nit: fix indentation? Looks like some rows are written differently than others, and not intentionally? http://gerrit.cloudera.org:8080/#/c/5727/2/src/kudu/tools/tool_action_common.h File src/kudu/tools/tool_action_common.h: Line 65: extern const char* const kTabletIdGlobArg; You could define these just once in tool_action_local_replica.cc, since they're only used there. http://gerrit.cloudera.org:8080/#/c/5727/2/src/kudu/tools/tool_action_local_replica.cc File src/kudu/tools/tool_action_local_replica.cc: Line 385: *running_sum += size; Nit: let's only update the OUT param on success. Line 386: if (VLOG_IS_ON(1)) { Do you want to piggy-back on the CLI's existing --verbose flag for this? Line 387: cout << block_type << " block " << b.ToString() << ": " Nit: Substitute() so this is more readable? Line 447: if (!MatchPattern(tablet_id, tablet_id_pattern)) continue; Should probably fully document the pattern matching semantics in the tool's help (i.e. incorporate '?', talk about escaping, talk about limits, etc) Line 471: (col_idx != Schema::kColumnNotFound) ? Is that possible? For a dropped column? Would be nice to get some test coverage on this case. Line 969: .Description("Summarize the data size/space usage of the given local replica(s).") It would also be nice to slice the data vertically. For example, I want to see just the BLOOM usage. You could define an optional parameter whose default value is "bloom,undo,redo,pk,data" and use that to control what gets summed up. Bonus points if "data" could be broken up into "column 1, column 2, etc.", though that's more challenging. Line 982: .AddAction(std::move(data_size)) Nit: resort alphabetically. -- To view, visit http://gerrit.cloudera.org:8080/5727 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibf0237f9d01f4ec332d1df545d2c2f51f64ce012 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-HasComments: Yes
