Alex Behm has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
......................................................................


Patch Set 7:

(29 comments)

http://gerrit.cloudera.org:8080/#/c/4414/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 1350:     // CTAS in an external Kudu table
What's the rationale for not allowing this?


Line 1716:     AnalyzesOk("create table tab (x int, y string, primary key(x, 
y)) " +
Can you add a brief comment explaining the table layout for this one?


Line 1717:         "distribute by hash(x) into 3 buckets, range(y) split rows " 
+
can we distribute by range, hash?


Line 1723:     AnalyzesOk("create table tab (a int, b int, c int, d int, 
primary key (a, b, c))" +
What happens if I specify PARTITION BY for a Kudu table?


Line 1739:     AnalysisError(String.format("create table tab (x int) distribute 
by hash (x) " +
What about specifying the other params in tblproperties like the distribute by 
and split rows.


Line 1748:     AnalysisError("create table tab (x int primary key, primary 
key(x)) stored " +
Add negative test where two ColumnDefs are declared as PK.


Line 1762:     AnalysisError("create table tab (a int, b int, c int, d int, 
primary key(a, b, c)) " +
Do we have tests somewhere for checking which types are supported with Kudu? We 
should make sure that:
* you can create a table with all supported types (and same for the specific 
clauses like primary key, distributed by, etc.)
* you cannot create tables with unsupported types like TIMESTAMP/DECIMAL and 
nested types (should fail gracefully with "not supported")
* or alternatively, if we want to defer the type checks to Kudu (and not bake 
it into Impala analysis), then we should document that somewhere


Line 1772:     // No float split keys
Even if the column type is float? If so, might want to test that.


Line 1810:     // DISTRIBUTE BY is required for managed tables.
Primary key is also required, do we have a test?


Line 1812:         "Table partitioning must be specified for managed Kudu 
tables.");
Let's rephrase this as "Table distribution must be specified ..."


PS6, Line 1828: AnalysisError("create external table t tblproperties (" +
Not that it makes any sense, but hat happens with:

create external table t (primary key()) ...


Line 1863:   public void TestCreateAvroTest() {
Add a test with some of the Kudu-specific clauses and STORED AS AVRO


Line 2822:   public void TestShowFiles() throws AnalysisException {
DO we have a TODO/JIRA somewhere to go through all DDL/SHOW commands and make 
them behave properly for Kudu?


http://gerrit.cloudera.org:8080/#/c/4414/6/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

Line 1632
Why remove this? It will break my setup :)


Line 2235:     ParsesOk("CREATE TABLE foo (i INT PRIMARY KEY, PRIMARY KEY(i)) 
STORED AS KUDU");
Add case like this:
CREATE TABLE foo (i INT PRIMARY KEY, j PRIMARY KEY) STORED AS KUDU


Line 2541:     ParsesOk("CREATE TABLE Foo TBLPROPERTIES ('a'='b', 'c'='d') AS 
SELECT * from bar");
nit: let's be consistent with how we chop up string literals across lines 
(space at end of previous line xor beginning of next line)


http://gerrit.cloudera.org:8080/#/c/4414/6/fe/src/test/java/org/apache/impala/catalog/HdfsStorageDescriptorTest.java
File fe/src/test/java/org/apache/impala/catalog/HdfsStorageDescriptorTest.java:

Line 58:         "org.apache.impala.hive.serde.ParquetInputFormat",
revert


http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

Line 203:   if row_format and file_format == 'text':
I think row_format is also valid for sequence files (maybe we're not using it 
though)


http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

Line 823: distribute by range(id) split rows ((1003), (1007)) stored as kudu;
ah so much better


http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
File 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test:

Line 218: create table kudu_table_clone like functional_kudu.alltypes
Why can't we check this in analysis?


http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test:

Line 3: # This test file contains several cases for what basically amount to 
analysis errors,
Should mention that some of this behavior is intentional and why.


Line 11: ImpalaRuntimeException: Type TIMESTAMP is not supported in Kudu
Do we get the same message for DECIMAL and complex types? (No need to add 
another test, just asking whether you've checked).


Line 14: create table t primary key (id) distribute by hash (id) into 3 buckets
Add test for creating and querying a table that has Impala keywords as the 
table name and some column names.


http://gerrit.cloudera.org:8080/#/c/4414/6/tests/common/kudu_test_suite.py
File tests/common/kudu_test_suite.py:

Line 69:   def get_db_name(cls):
Why not use the unique_database fixture? Sure, we don't run in parallel but 
unique_database seems saner (no need to explain all these framework intricacies)


Line 94:        name will be used. If 'prepend_db_name' is True, the table name 
will be prepended
what does the db_name parameter do then?


http://gerrit.cloudera.org:8080/#/c/4414/6/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 57:   def test_insert_update_delete(self, vector, unique_database):
Can we keep the .test file name and the python test function consistent? i.e. 
rename with to test_kudu_crud


http://gerrit.cloudera.org:8080/#/c/4414/7/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 192:        'show_create_sql' can be templates that can be used with 
str.format(). format()
In our other SHOW CREATE TABLE tests we also try to run execute the output of 
SHOW CREATE TABLE


Line 213:         TBLPROPERTIES 
('kudu.master_addresses'='{kudu_addr}')""".format(
Why include these TBLPROPERTIES?


Line 226:         DISTRIBUTE BY HASH (c) INTO 3 BUCKETS, RANGE (c) SPLIT ROWS 
(...)
is the SPLIT ROWS (...) legal syntax?


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to