Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12167 )

Change subject: [tools] Add table scan tool
......................................................................


Patch Set 10:

(25 comments)

Thanks for changing the query language! I think it's nicer now. It looks pretty 
good. I just had one big structural ask which is to separate all the code for 
the scanning into a file separate from tool_action_table.cc.

http://gerrit.cloudera.org:8080/#/c/12167/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12167/10//COMMIT_MSG@9
PS10, Line 9: , several
New sentence: ". Several..."


http://gerrit.cloudera.org:8080/#/c/12167/10//COMMIT_MSG@10
PS10, Line 10: the
Remove.


http://gerrit.cloudera.org:8080/#/c/12167/10//COMMIT_MSG@23
PS10, Line 23: All sub-predicates combined together as
             : '[operator, sub-predicate1, sub-predicate2, ...]', and only 'AND'
             : operator is supported now.
For clarity, I'd rephrase this as:

Predicates can be combined together with predicate operators using the syntax

    [operator, predicate, predicate, ..., predicate].

For example,

    ["AND", [">=", "col1", "value"], ["NOTNULL", "col2"]]

The only supported predicate operator is `AND`.


http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/client/scanner-internal.h
File src/kudu/client/scanner-internal.h:

http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/client/scanner-internal.h@314
PS10, Line 314: if (direct_data_.empty()) {
              :       return KuduRowResult(projection_, nullptr);
              :     }
Is this change related to the tool? What's its purpose?

It's a semantic change because it doesn't look like the second arg to the 
KuduRowResult constructor would ever be nullptr in the original code.


http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/tools/kudu-tool-test.cc@285
PS10, Line 285: SCOPED_TRACE(*stdout_lines);
              :     ASSERT_OK(s);
Could you add 'stderr' and 'SCOPED_TRACE(stderr)' back in here? Like how it is 
in 'RunActionStderrLines'.


http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/tools/kudu-tool-test.cc@394
PS10, Line 394: string projections;
              :     for (const auto& column : columns) {
              :       projections += (column.second + ",");
              :     }
For this sort of pattern, it's also possible to accumulate the values in a 
vector and then use JoinStrings from kudu/gutil/strings/join.h:

    vector<string> acc;
    for (const auto& column : columns) {
      acc.push_back(column.second);
    }
    const string projection = JoinStrings(acc, ",");

This way is fine though...up to you if you want to change it.


http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/tools/kudu-tool-test.cc@400
PS10, Line 400: RunActionStderrString
I think we want to use RunActionStdoutLines here. The tool should print the 
rows on stdout, and we should check each line matches the expected format for a 
row.


http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/tools/kudu-tool-test.cc@407
PS10, Line 407: string line_pattern(" \\(");
              :       int i = 0;
              :       for (const auto& column : columns) {
              :         // Check matched rows.
              :         line_pattern += Substitute("$0 $1=$2",
              :             column.first, column.second, column.second == "key" 
? to_string(value) : ".*");
              :         if (++i < columns.size()) {
              :           line_pattern += (", ");
              :         }
              :       }
              :       line_pattern += (")");
It'd be possible to simplify this a bit with JoinStrings-- you could get rid of 
'i', for example.


http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/tools/kudu-tool-test.cc@421
PS10, Line 421: cost
Remove.


http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/tools/kudu-tool-test.cc@2361
PS10, Line 2361: shared_ptr<KuduClient> client;
               :   ASSERT_OK(cluster_->CreateClient(nullptr, &client));
Right now you are not using the client.


http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/tools/kudu-tool-test.cc@2365
PS10, Line 2365: &
nit: Use a value instead of reference.


http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/tools/kudu-tool-test.cc@2383
PS10, Line 2383: "[\"AND\",[\"=\",\"key\",1]]"
C++ raw string literals would help make this more readable. See 
https://en.cppreference.com/w/cpp/language/string_literal #6. In this case,

    "[\"AND\",[\"=\",\"key\",1]]"

could be

    R"*(["AND",["=","key",1]])*"


http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/tools/kudu-tool-test.cc@2404
PS10, Line 2404: // TODO how to match?
You can use the 'client' you created above to insert one more row that has one 
column null. The default schema of the TestWorkload is

    inline Schema GetSimpleTestSchema() {
      return Schema({ ColumnSchema("key", INT32),
                      ColumnSchema("int_val", INT32),
                      ColumnSchema("string_val", STRING, true) },
                    1);
    }

so the 'string_val' column is nullable. Then you'll get back one row here and 
one more row for all the other tests. The NOTNULL predicate test should be 
changed to be on 'string_val'.


http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/tools/kudu-tool-test.cc@2414
PS10, Line 2414: shared_ptr<KuduClient> client;
               :   ASSERT_OK(cluster_->CreateClient(nullptr, &client));
Not used.


http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/tools/kudu-tool-test.cc@2418
PS10, Line 2418: &
nit: Value instead of reference.


http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/tools/kudu-tool-test.cc@2420
PS10, Line 2420:   KuduSchemaBuilder schema_builder;
               :   schema_builder.AddColumn("key")
               :       ->Type(client::KuduColumnSchema::INT32)
               :       ->NotNull()
               :       ->PrimaryKey();
               :   schema_builder.AddColumn("string_value")
               :       ->Type(client::KuduColumnSchema::STRING);
               :   KuduSchema schema;
               :   ASSERT_OK(schema_builder.Build(&schema));
I think using TestWorkload's default schema is fine, since this schema is just 
a projection of that one!


http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/tools/kudu-tool-test.cc@2454
PS10, Line 2454: shared_ptr<KuduClient> client;
               :   ASSERT_OK(cluster_->CreateClient(nullptr, &client));
Unused.


http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/tools/kudu-tool-test.cc@2460
PS10, Line 2460:   KuduSchemaBuilder schema_builder;
               :   schema_builder.AddColumn("key")
               :       ->Type(client::KuduColumnSchema::INT32)
               :       ->NotNull()
               :       ->PrimaryKey();
               :   schema_builder.AddColumn("string_value")
               :       ->Type(client::KuduColumnSchema::STRING);
               :   KuduSchema schema;
               :   ASSERT_OK(schema_builder.Build(&schema));
Again, should be able to use TestWorkload's default schema for this.


http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/tools/kudu-tool-test.cc@2498
PS10, Line 2498: LOG(INFO) << line;
Don't need this once the test works- the asserts should log the useful bit.


http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/tools/tool_action_table.cc
File src/kudu/tools/tool_action_table.cc:

http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/tools/tool_action_table.cc@91
PS10, Line 91: "All sub-predicates combined together as\n"
             :               "'[operator, sub-predicate1, sub-predicate2, 
...]', and only 'AND' "
             :               "operator is supported now.\n"
             :               "  e.g. '[\"AND\", [\">=\", \"col1\", \"value\"], 
[\"NOTNULL\", \"col2\"]]'");
Update this in the same way as I suggested updating the commit message.


http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/tools/tool_action_table.cc@95
PS10, Line 95: DEFINE_bool(show_value, false,
             :             "Whether to show values of scanned rows.");
Why don't we want to show the values by default?


http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/tools/tool_action_table.cc@186
PS10, Line 186: AtomicInt<uint64_t> total_count(0);
If you need state, put it in a class.


http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/tools/tool_action_table.cc@200
PS10, Line 200: shared_ptr
Please leave this as client::sp::shared_ptr. It's intentional that the prefix 
is left because client::sp::shared_ptr and shared_ptr are not quite the same 
thing and we want to make it clear when the client::sp version is being used.


http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/tools/tool_action_table.cc@445
PS10, Line 445: PredicateType ParsePredicateType(const string& predicate_type) {
I think this is enough extra code that it should go in its own file. Maybe 
table_scan.{h, cc} or something.


http://gerrit.cloudera.org:8080/#/c/12167/10/src/kudu/tools/tool_action_table.cc@678
PS10, Line 678: Status ScanTable(const RunnerContext &context) {
Make a class called TableScanner or whatever and put all this stuff in it, and 
put the class in a separate file. There here you just construct the TableScan 
object and call a few methods on it.



--
To view, visit http://gerrit.cloudera.org:8080/12167
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieac340b70a9eaf131f82a2b7d61336211d1d48f8
Gerrit-Change-Number: 12167
Gerrit-PatchSet: 10
Gerrit-Owner: Yingchun Lai <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Reviewer: Yingchun Lai <[email protected]>
Gerrit-Comment-Date: Thu, 24 Jan 2019 21:12:23 +0000
Gerrit-HasComments: Yes

Reply via email to