Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12516 )
Change subject: [tools] Add tool 'perf table_scan' ...................................................................... Patch Set 7: (16 comments) http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/kudu-tool-test.cc@410 PS7, Line 410: for (auto line = lines.begin(); line != lines.end();) { Looks like every iteration of the loop winds up erasing 'line', which is always the current element. Why not a simpler for loop then? for (const auto& line : lines) Why use iterators and erase at all? OK, on second read, I see that the break on L435 is there to help assert that the last expected line really is the last line. I think you can achieve this via simple for loop and an index that keeps track of the last line read. When you break early, you can compare the index to lines.size() to enforce that the last expected line really is the last line. http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/kudu-tool-test.cc@416 PS7, Line 416: project projected http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/kudu-tool-test.cc@438 PS7, Line 438: if (FLAGS_show_value) { : ASSERT_EQ(value, max_value + 1); : } Why is this wrapped in this condition? I don't see any place in the test setting this flag, and it seems like something you could assert on regardless. http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/kudu-tool-test.cc@1809 PS7, Line 1809: RunScanTableCheck(kTableName, "", 1, 2000, {}, "perf table_scan"); Wrap in NO_FATALS. http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/table_scanner.h File src/kudu/tools/table_scanner.h: http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/table_scanner.h@42 PS7, Line 42: namespace kudu { : namespace tools { : using client::KuduClient; : using client::KuduScanToken; : using std::ostream; : using std::string; : using std::vector; : Nit: stylistically we've actually been trending in the opposite direction: consolidating all 'using' statements outside of all namespaces in one group. http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/table_scanner.h@51 PS7, Line 51: public: Nit: public/private should be indented by an additional character. http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/table_scanner.h@59 PS7, Line 59: void SetOutput(ostream* out); Should doc what happens if this is called vs. not called. Should also doc the expected lifetimes (i.e. 'out' must outlive the TableScanner itself). Same with SetSnapshot. http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/table_scanner.h@64 PS7, Line 64: uint64_t TotalScannedCount() const { : return total_count_.Load(); : } This doesn't appear to be used; could you remove it until such a time as someone needs it? http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/table_scanner.h@72 PS7, Line 72: private: Nit: don't need this private; there's one on L68. http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/table_scanner.h@79 PS7, Line 79: // Protects output to stdout so that rows don't get interleaved. Now it protects output to out_, right? http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/table_scanner.cc File src/kudu/tools/table_scanner.cc: http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/table_scanner.cc@90 PS7, Line 90: DEFINE_bool(show_value, false, Kind of wish we'd rename this to "show_values" or even "show_row_data" to be more explicit. http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/tool_action_common.h File src/kudu/tools/tool_action_common.h: http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/tool_action_common.h@147 PS7, Line 147: // Create kudu client connect to 'kMasterAddressesArg' cluster. Nit: reword as "Creates a Kudu client connected to the cluster whose master addresses are defined by the kMasterAddressesArg argument in 'context'." http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/tool_action_perf.cc File src/kudu/tools/tool_action_perf.cc: http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/tool_action_perf.cc@575 PS7, Line 575: // It's necessary to have read-what-you-write behavior here. Since I don't understand this. The tool isn't doing any writing and this KuduClient instance hasn't done any scans yet, so won't GetLatestObservedTimestamp() return kNoTimestamp? http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/tool_action_perf.cc@575 PS7, Line 575: // It's necessary to have read-what-you-write behavior here. Since : // tablet leader can change and there might be replica propagation delays, : // set the snapshot to the latest observed one to get correct row count. : // Due to KUDU-1656, there might be timeouts due to tservers catching up with : // the requested snapshot. The simple workaround: if the timeout error occurs, : // retry the row count operation. Part (or maybe all?) of this comment now belongs within TableScanner.Run(). http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/tool_action_perf.cc@581 PS7, Line 581: // NOTE: +1 is due to the current implementation of the scanner. Can you be more specific? http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/tool_action_perf.cc@758 PS7, Line 758: .ExtraDescription("Show row count and scanning time cost of tablets in a table, it's useful " : "to check whether there is a row count skew in different tablets, or to " : "check whether there is a long tail to scan different tablets.") Nit: reword as "Show row count and scanning time of tablets in a table. This can be useful to check for row count skew across different tablets, or whether there is a long latency tail when scanning different tables". -- To view, visit http://gerrit.cloudera.org:8080/12516 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibebed05d9a91ae3f126d3cf2d92f66852b253edd Gerrit-Change-Number: 12516 Gerrit-PatchSet: 7 Gerrit-Owner: Yingchun Lai <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Reviewer: Yingchun Lai <[email protected]> Gerrit-Comment-Date: Fri, 15 Mar 2019 18:28:56 +0000 Gerrit-HasComments: Yes
