Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12167 )

Change subject: [tools] Add table scan tool
......................................................................


Patch Set 4:

(5 comments)

I left a few high-level comments, but at the top level: what is the goal of 
this tool? Is it to serve as a barebones shell, like impala-shell or MySQL 
shell? Is it meant for benchmarking scan performance?

http://gerrit.cloudera.org:8080/#/c/12167/4/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/12167/4/src/kudu/tools/kudu-tool-test.cc@2316
PS4, Line 2316: TEST_F(ToolTest, TestScanTable)
This requires a lot more tests. Correctness tests should cover at least the 
following things:

1. Scanning with no predicates and no columns for just a row count (Done).
2. Scanning with each predicate and checking it gets exactly the correct rows 
(includes matching, excludes non matching).
3. Scanning with projections of one or more columns and making sure the right 
columns are returned.
4. Scanning with multiple predicates on one column.
5. Scanning with multiple predicates across different columns.


http://gerrit.cloudera.org:8080/#/c/12167/4/src/kudu/tools/tool_action_table.cc
File src/kudu/tools/tool_action_table.cc:

http://gerrit.cloudera.org:8080/#/c/12167/4/src/kudu/tools/tool_action_table.cc@18
PS4, Line 18: #include <stdlib.h>
Prefer #include <cstdlib>.


http://gerrit.cloudera.org:8080/#/c/12167/4/src/kudu/tools/tool_action_table.cc@76
PS4, Line 76:               "Query predicates on columns. Unlike the 
traditional"
Here and elsewhere, add "\n" to include linebreaks in the help message.


http://gerrit.cloudera.org:8080/#/c/12167/4/src/kudu/tools/tool_action_table.cc@79
PS4, Line 79: WhetherNull
I think we can call this IsNull. It's shorter.


http://gerrit.cloudera.org:8080/#/c/12167/4/src/kudu/tools/tool_action_table.cc@81
PS4, Line 81: '[', '(', '=', ')' or ']'"
I find these really confusing. For example, in the sample from the help message:

    col1:[:lower;col1:]:upper;col2:@:v1,v2,v3;col3:n:NULL

My brain matches up [] pairs and wants to group together what's between them, 
so I want to interpret '[:lower;col1:]' as something meaningful when it's split 
between two predicates. Could we use friendlier symbols for the comparison 
operators? Can we just use < for <, <= for <=, etc?

Also, a few other things we ought to try and address with the syntax (or maybe 
just with documentation and help messages):

1. How can a user query for, e.g. 'KEY <= 10 AND KEY > 0'? Use two predicates?
2. Could we use = for equality and in-list predicates? E.g. 'col0:=:5' and 
'col0:=:5,6,7,8'?
3. Why do we need the ":NULL" at the end of 'col3:n:NULL'? It's easier to 
parse, for sure, but it conveys no meaning.
4. How would a user do an equality predicate on a string with a comma in it?

I think it might be a good idea to use a simple JSON syntax for the predicates. 
This gives us a defined way to handle all the different characters, at the cost 
of some verbosity.



--
To view, visit http://gerrit.cloudera.org:8080/12167
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieac340b70a9eaf131f82a2b7d61336211d1d48f8
Gerrit-Change-Number: 12167
Gerrit-PatchSet: 4
Gerrit-Owner: Yingchun Lai <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Comment-Date: Tue, 08 Jan 2019 21:30:41 +0000
Gerrit-HasComments: Yes

Reply via email to