Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12516 )

Change subject: [tools] Add tool 'perf table_scan'
......................................................................


Patch Set 7:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/kudu-tool-test.cc@410
PS7, Line 410:     for (auto line = lines.begin(); line != lines.end();) {
Looks like every iteration of the loop winds up erasing 'line', which is always 
the current element. Why not a simpler for loop then?

  for (const auto& line : lines)

Why use iterators and erase at all?

OK, on second read, I see that the break on L435 is there to help assert that 
the last expected line really is the last line. I think you can achieve this 
via simple for loop and an index that keeps track of the last line read. When 
you break early, you can compare the index to lines.size() to enforce that the 
last expected line really is the last line.


http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/kudu-tool-test.cc@416
PS7, Line 416: project
projected


http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/kudu-tool-test.cc@438
PS7, Line 438:     if (FLAGS_show_value) {
             :       ASSERT_EQ(value, max_value + 1);
             :     }
Why is this wrapped in this condition? I don't see any place in the test 
setting this flag, and it seems like something you could assert on regardless.


http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/kudu-tool-test.cc@1809
PS7, Line 1809:   RunScanTableCheck(kTableName, "", 1, 2000, {}, "perf 
table_scan");
Wrap in NO_FATALS.


http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/table_scanner.h
File src/kudu/tools/table_scanner.h:

http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/table_scanner.h@42
PS7, Line 42: namespace kudu {
            : namespace tools {
            : using client::KuduClient;
            : using client::KuduScanToken;
            : using std::ostream;
            : using std::string;
            : using std::vector;
            :
Nit: stylistically we've actually been trending in the opposite direction: 
consolidating all 'using' statements outside of all namespaces in one group.


http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/table_scanner.h@51
PS7, Line 51: public:
Nit: public/private should be indented by an additional character.


http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/table_scanner.h@59
PS7, Line 59:   void SetOutput(ostream* out);
Should doc what happens if this is called vs. not called. Should also doc the 
expected lifetimes (i.e. 'out' must outlive the TableScanner itself).

Same with SetSnapshot.


http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/table_scanner.h@64
PS7, Line 64:   uint64_t TotalScannedCount() const {
            :     return total_count_.Load();
            :   }
This doesn't appear to be used; could you remove it until such a time as 
someone needs it?


http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/table_scanner.h@72
PS7, Line 72: private:
Nit: don't need this private; there's one on L68.


http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/table_scanner.h@79
PS7, Line 79:   // Protects output to stdout so that rows don't get interleaved.
Now it protects output to out_, right?


http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/table_scanner.cc
File src/kudu/tools/table_scanner.cc:

http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/table_scanner.cc@90
PS7, Line 90: DEFINE_bool(show_value, false,
Kind of wish we'd rename this to "show_values" or even "show_row_data" to be 
more explicit.


http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/tool_action_common.h
File src/kudu/tools/tool_action_common.h:

http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/tool_action_common.h@147
PS7, Line 147: // Create kudu client connect to 'kMasterAddressesArg' cluster.
Nit: reword as "Creates a Kudu client connected to the cluster whose master 
addresses are defined by the kMasterAddressesArg argument in 'context'."


http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/tool_action_perf.cc
File src/kudu/tools/tool_action_perf.cc:

http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/tool_action_perf.cc@575
PS7, Line 575:   // It's necessary to have read-what-you-write behavior here. 
Since
I don't understand this. The tool isn't doing any writing and this KuduClient 
instance hasn't done any scans yet, so won't GetLatestObservedTimestamp() 
return kNoTimestamp?


http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/tool_action_perf.cc@575
PS7, Line 575:   // It's necessary to have read-what-you-write behavior here. 
Since
             :   // tablet leader can change and there might be replica 
propagation delays,
             :   // set the snapshot to the latest observed one to get correct 
row count.
             :   // Due to KUDU-1656, there might be timeouts due to tservers 
catching up with
             :   // the requested snapshot. The simple workaround: if the 
timeout error occurs,
             :   // retry the row count operation.
Part (or maybe all?) of this comment now belongs within TableScanner.Run().


http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/tool_action_perf.cc@581
PS7, Line 581:   // NOTE: +1 is due to the current implementation of the 
scanner.
Can you be more specific?


http://gerrit.cloudera.org:8080/#/c/12516/7/src/kudu/tools/tool_action_perf.cc@758
PS7, Line 758:       .ExtraDescription("Show row count and scanning time cost 
of tablets in a table, it's useful "
             :                         "to check whether there is a row count 
skew in different tablets, or to "
             :                         "check whether there is a long tail to 
scan different tablets.")
Nit: reword as "Show row count and scanning time of tablets in a table. This 
can be useful to check for row count skew across different tablets, or whether 
there is a long latency tail when scanning different tables".



--
To view, visit http://gerrit.cloudera.org:8080/12516
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibebed05d9a91ae3f126d3cf2d92f66852b253edd
Gerrit-Change-Number: 12516
Gerrit-PatchSet: 7
Gerrit-Owner: Yingchun Lai <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Reviewer: Yingchun Lai <[email protected]>
Gerrit-Comment-Date: Fri, 15 Mar 2019 18:28:56 +0000
Gerrit-HasComments: Yes

Reply via email to