Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
......................................................................


Patch Set 3:

(14 comments)

next batch... I'm probably about 1/2 way through.

http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java:

PS1, Line 275: 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
> Actually, I don't find it such a terrible idea. The sooner we figure out in
It's just more code to support and test...


http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java
File fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java:

PS1, Line 138: 
             : 
             : 
> There is a restriction on the types of primary keys (no bool, float or doub
Thanks, makes sense


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

Line 87:   InsertStmt getInsertStmt() { return insertStmt_; }
why change visibility?


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java:

PS3, Line 344:   
nit extra space


PS3, Line 344: getTblPrimaryKeyColumnNames(),  null,
Why would there be PK col names?

Can you add a neg test case to AnalyzeDDLTest.java?

    AnalysisError("create table newkudutbl like kudu 
'/test-warehouse/schemas/alltypestiny.parquet'")


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

PS3, Line 167:     if (getFileFormat() != THdfsFileFormat.KUDU && 
KuduTable.KUDU_STORAGE_HANDLER.equals(
             :         getTblProperties().get(KuduTable.KEY_STORAGE_HANDLER))) {
             :       throw new 
AnalysisException(KUDU_STORAGE_HANDLER_ERROR_MESSAGE);
             :     }
             : 
             :     // Avro tables can have empty column defs because they can 
infer them from the Avro
             :     // schema. Likewise for external Kudu tables, the schema can 
be read from Kudu.
             :     if (getColumnDefs().isEmpty() && getFileFormat() != 
THdfsFileFormat.AVRO
             :         && getFileFormat() != THdfsFileFormat.KUDU) {
             :       throw new AnalysisException("Table requires at least 1 
column");
             :     }
             : 
             :     if (getFileFormat() == THdfsFileFormat.KUDU) {
             :       analyzeKuduFormat(analyzer);
             :     } else {
             :       
AnalysisUtils.throwIfNotNullOrNotEmpty(getDistributeParams(),
             :           "Only Kudu tables can use DISTRIBUTE BY clause.");
             :       throwIfPrimaryKeysWereSpecified("Only Kudu tables can 
specify a PRIMARY KEY.");
             :     }
             : 
             :     if (getFileFormat() == THdfsFileFormat.AVRO) {
             :       setColumnDefs(analyzeAvroSchema(analyzer));
             :       if (getColumnDefs().isEmpty()) {
             :         throw new AnalysisException(
             :             "An Avro table requires column definitions or an 
Avro schema.");
             :       }
             :       AvroSchemaUtils.setFromSerdeComment(getColumnDefs());
             :     }
             :   }
what if we were to move everything Kudu related here into analyzeKuduFormat() 
which first checks if it is Kudu. If it is not, it can make sure it _doesn't_ 
have the distribute or PK params, then perform the storage handler check, then 
return. If it is a Kudu table it'd mostly be the same as it is. Basically we 
could just always call the fn and it handles Kudu or no-Kudu checks 
appropriately. Then we'd get rid of some of the Kudu mess here.


PS3, Line 279:         throw new AnalysisException(String.format("Column '%s' 
(type: '%s') cannot " +
             :             "be used as a primary key. Keys must be integer or 
string types.",
             :             pkColDef.getColName(), pkColDef.getType().toSql()));
I don't see a test case for this error


PS3, Line 277:       ColumnDef pkColDef = getPrimaryKeyColumnDefs().get(i);
             :       if (!KuduUtil.isSupportedKeyType(pkColDef.getType())) {
             :         throw new AnalysisException(String.format("Column '%s' 
(type: '%s') cannot " +
             :             "be used as a primary key. Keys must be integer or 
string types.",
             :             pkColDef.getColName(), pkColDef.getType().toSql()));
             :       }
             :       if (!pkColDef.equals(getColumnDefs().get(i))) {
             :         throw new AnalysisException(String.format("Primary key 
columns in Kudu " +
             :             "tables must be declared first and in-order. Key 
column '%s' is " +
             :             "out-of-order.", pkColDef.getColName()));
             :       }
If we weren't to do these checks (i.e. that they're supported types and that 
they come first in the col defs), do you know how we would handle the Kudu 
failures?

Related to my prev comment in #1 about handling distribution param checking is 
just that if they start relaxing these requirements then we have to change it 
and users have to upgrade Impala as well as Kudu. If we can handle Kudu errors 
on create table nicely, it would help us cut down on the amount of code we have.


PS3, Line 315:    * - Only primary key column can be used in distribution 
definitions
             :    * - Distributions may be either hash or ranged (value 
intervals)
             :    * - Multiple distributions can be used/mixed
             :    * - A column can only be used in one hash distribution
             :    * - A range distribution can only be used once and must be 
the last definition
             :    *   (both enforced by the grammar).
What Kudu supports for partitioning is a bit tricky when we get into nested 
schemes, so I'm a bit concerned that (a) we're matching their supported 
functionality perfectly, (b) testing our validation properly, and (c) that 
we'll necessarily catch any changes that they end up making down the road. It 
would be nice if we can get a nice error from them and avoid duplication of 
efforts given the complexity. Let's chat about this in person?


PS3, Line 358: throwIfPrimaryKeysWereSpecified
I do prefer to change this to return a bool and have callers throw: bool 
hasPrimaryKeysSpecified()


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java:

PS3, Line 156:       if (splitRows_ == null) {
             :         builder.append("...");
can this happen? wouldn't l115 throw?


PS3, Line 191:       if (splitRows_ == null) {
             :         result.setBy_range_param(rangeParam);
             :         return result;
             :       }
this is possible? what does it mean?


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

PS3, Line 259: primarily
remove


PS3, Line 258: Creates a tmp Kudu table. The table is not added to the Kudu 
storage engine or the
             :    * HMS.
I wasn't sure what this meant. How about:
Creates a temporary KuduTable object populated with the specified properties 
but has an invalid TableId and is not added to...


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to