Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables ......................................................................
Patch Set 3: (14 comments) next batch... I'm probably about 1/2 way through. http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java File fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java: PS1, Line 275: : : : : : : : > Actually, I don't find it such a terrible idea. The sooner we figure out in It's just more code to support and test... http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java File fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java: PS1, Line 138: : : > There is a restriction on the types of primary keys (no bool, float or doub Thanks, makes sense http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: Line 87: InsertStmt getInsertStmt() { return insertStmt_; } why change visibility? http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java: PS3, Line 344: nit extra space PS3, Line 344: getTblPrimaryKeyColumnNames(), null, Why would there be PK col names? Can you add a neg test case to AnalyzeDDLTest.java? AnalysisError("create table newkudutbl like kudu '/test-warehouse/schemas/alltypestiny.parquet'") http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: PS3, Line 167: if (getFileFormat() != THdfsFileFormat.KUDU && KuduTable.KUDU_STORAGE_HANDLER.equals( : getTblProperties().get(KuduTable.KEY_STORAGE_HANDLER))) { : throw new AnalysisException(KUDU_STORAGE_HANDLER_ERROR_MESSAGE); : } : : // Avro tables can have empty column defs because they can infer them from the Avro : // schema. Likewise for external Kudu tables, the schema can be read from Kudu. : if (getColumnDefs().isEmpty() && getFileFormat() != THdfsFileFormat.AVRO : && getFileFormat() != THdfsFileFormat.KUDU) { : throw new AnalysisException("Table requires at least 1 column"); : } : : if (getFileFormat() == THdfsFileFormat.KUDU) { : analyzeKuduFormat(analyzer); : } else { : AnalysisUtils.throwIfNotNullOrNotEmpty(getDistributeParams(), : "Only Kudu tables can use DISTRIBUTE BY clause."); : throwIfPrimaryKeysWereSpecified("Only Kudu tables can specify a PRIMARY KEY."); : } : : if (getFileFormat() == THdfsFileFormat.AVRO) { : setColumnDefs(analyzeAvroSchema(analyzer)); : if (getColumnDefs().isEmpty()) { : throw new AnalysisException( : "An Avro table requires column definitions or an Avro schema."); : } : AvroSchemaUtils.setFromSerdeComment(getColumnDefs()); : } : } what if we were to move everything Kudu related here into analyzeKuduFormat() which first checks if it is Kudu. If it is not, it can make sure it _doesn't_ have the distribute or PK params, then perform the storage handler check, then return. If it is a Kudu table it'd mostly be the same as it is. Basically we could just always call the fn and it handles Kudu or no-Kudu checks appropriately. Then we'd get rid of some of the Kudu mess here. PS3, Line 279: throw new AnalysisException(String.format("Column '%s' (type: '%s') cannot " + : "be used as a primary key. Keys must be integer or string types.", : pkColDef.getColName(), pkColDef.getType().toSql())); I don't see a test case for this error PS3, Line 277: ColumnDef pkColDef = getPrimaryKeyColumnDefs().get(i); : if (!KuduUtil.isSupportedKeyType(pkColDef.getType())) { : throw new AnalysisException(String.format("Column '%s' (type: '%s') cannot " + : "be used as a primary key. Keys must be integer or string types.", : pkColDef.getColName(), pkColDef.getType().toSql())); : } : if (!pkColDef.equals(getColumnDefs().get(i))) { : throw new AnalysisException(String.format("Primary key columns in Kudu " + : "tables must be declared first and in-order. Key column '%s' is " + : "out-of-order.", pkColDef.getColName())); : } If we weren't to do these checks (i.e. that they're supported types and that they come first in the col defs), do you know how we would handle the Kudu failures? Related to my prev comment in #1 about handling distribution param checking is just that if they start relaxing these requirements then we have to change it and users have to upgrade Impala as well as Kudu. If we can handle Kudu errors on create table nicely, it would help us cut down on the amount of code we have. PS3, Line 315: * - Only primary key column can be used in distribution definitions : * - Distributions may be either hash or ranged (value intervals) : * - Multiple distributions can be used/mixed : * - A column can only be used in one hash distribution : * - A range distribution can only be used once and must be the last definition : * (both enforced by the grammar). What Kudu supports for partitioning is a bit tricky when we get into nested schemes, so I'm a bit concerned that (a) we're matching their supported functionality perfectly, (b) testing our validation properly, and (c) that we'll necessarily catch any changes that they end up making down the road. It would be nice if we can get a nice error from them and avoid duplication of efforts given the complexity. Let's chat about this in person? PS3, Line 358: throwIfPrimaryKeysWereSpecified I do prefer to change this to return a bool and have callers throw: bool hasPrimaryKeysSpecified() http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java: PS3, Line 156: if (splitRows_ == null) { : builder.append("..."); can this happen? wouldn't l115 throw? PS3, Line 191: if (splitRows_ == null) { : result.setBy_range_param(rangeParam); : return result; : } this is possible? what does it mean? http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: PS3, Line 259: primarily remove PS3, Line 258: Creates a tmp Kudu table. The table is not added to the Kudu storage engine or the : * HMS. I wasn't sure what this meant. How about: Creates a temporary KuduTable object populated with the specified properties but has an invalid TableId and is not added to... -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-HasComments: Yes
