[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

Dimitris Tsirogiannis (Code Review) Tue, 11 Oct 2016 22:20:08 -0700

Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
......................................................................



Patch Set 4:

(54 comments)

http://gerrit.cloudera.org:8080/#/c/4414/4/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 53: enum THdfsFileFormat {
> rename
This change would touch many places. Would you mind postponing it for a follow 
up patch?


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

Line 976:   tbl_def_without_col_defs:tbl_def
> a 'create table' without col defs?
This was added for the EXTERNAL Kudu table use case for which no column 
definitions are specified since we load the schema from the Kudu table. Added a 
comment.


Line 980:     RESULT = new CreateTableStmt(tbl_def); 
> trailing
Done


Line 1033: // class doesn't inherit from CreateTableStmt.
> should it?
To my opinion yes it should. I actually went down the path of refactoring all 
the CREATE TABLE* statements but it ended up being too complex to add on top of 
this big patch. Simplifying the CREATE TABLE statements will also allow us to 
remove some of the weird table option handling we do in TableDef.java. I will 
leave a TODO for now.


Line 1065: primary_keys_val ::=
> opt_primary_keys?
Done


Line 1089: tbl_data_layout ::=
> opt_...?
Done


Line 1139:   {: 
> fix spaces and tabs
Done


Line 1370:   KW_PRIMARY key_ident
> what's wrong with KW_KEY?
I don't think we can do that. Don't we use "key" for nested types (map)?


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java
File fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java:

Line 30:   static void throwIfNotNullOrNotEmpty(Collection<?> c, String message)
> this is actually 'not null *and* not empty'. you can also phrase that as 'n
Good point. Done


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java:

Line 158:     if (fileFormat_ == THdfsFileFormat.KUDU) {
> check at top
Done


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

Line 236:         String.format("PRIMARY KEY must be used instead of the table 
property '%s'.",
> not good: if you do that on an external table, you get this error message i
I like that idea but it's a bit more complicated for Kudu because different 
properties are valid depending on whether it's an external or managed table. 
Moved the check to the function below. Let me know if that works ok.


Line 310:       distributeParam.setPKColumnDefMap(pkColumnDefsByName);
> setPkColumn...
Done


Line 315:   private boolean hasPrimaryKeysSpecified() {
> hasPrimaryKeySpecified (there's only one, which can be a composite key)
Done


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java:

Line 121:           org.apache.impala.catalog.Type colType = colDef.getType();
> does simply Type conflict with something?
Yeah, there is a conflict with the enum Type in this class.


Line 129:           if (colType.isStringType() && !exprType.isStringType()
> this is basically looking for 'assignment compatible', and i'm sure we alre
Done


Line 150:       builder.append(numBuckets_).append(" BUCKETS");
> sprinkle some checkstates in here (on numbuckets and splitrows; or maybe a 
I added checks for numBuckets_. Split rows will go away in a follow up patch 
with the new range partitioning syntax. I left a TODO to add a validate 
function then.


Line 200:             literal.setString_literal(expr.getStringValue());
> checkstate that you're getting something valid
Done


Line 211:   void setPKColumnDefMap(Map<String, ColumnDef> pkColumnDefByName) {
> setPkC...
Done


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

Line 74:   static class TableDefOptions {
> 'Options' is enough
Done


Line 160:     fullyQualifiedTableName_ = analyzer.getFqTableName(getTblName());
> stick with fq abbreviation?
Done


Line 189:     for (ColumnDef colDef: getPartitionColumnDefs()) {
> this is a bit hard to follow. partition cols aren't defined separately, the
These are the columns specified in a PARTITIONED BY clause (non-kudu) and they 
should be analyzed, no? Sorry, I am not sure I follow your comment.


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
File fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java:

PS4, Line 97: org.apache
> should this have changed?
Hm sed is not that smart :)


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

PS4, Line 89: com.cloudera
> can you add a ref to IMPALA-4271 ?
Done


Line 111:   // Distribution schemes of this Kudu table. Both rang and 
hash-based distributions are
> range
Done


Line 140:     return msTbl.getParameters().get(KuduTable.KEY_TABLE_NAME);
> i don't think this is worth a function call, it just makes the code harder 
Done


PS4, Line 160: know
> known
Done


PS4, Line 159:   /**
             :    * The number of nodes is not know ahead of time and will be 
updated during computeStats
             :    * in the scan node.
             :    */
             :   public int getNumNodes() { return -1; }
> I don't see this used
Yeah, removed.


PS4, Line 175: numClusteringCols_ = 0;
> not really related to this change, but it's kind of confusing to have numCl
I like Marcel's suggestion, I changed it to be the number of primary key 
columns.


PS4, Line 175: numClusteringCols_ = 0;
> those should be the primary key cols
Done


PS4, Line 226:     List<FieldSchema> cols = msTable_.getSd().getCols();
             :     cols.clear();
> why do we get cols from getCols() and then clear() it?
cols is a reference to msTable cols. We clear them here and reload them from 
Kudu schema in L232. Let me know if it's still not clear or if I should add a 
comment.


PS4, Line 232: cols.add(new FieldSchema(colName, type.toSql().toLowerCase(), 
null));
> why do we do this? cols isn't used later
See my comment above. I can add a comment if it's still not clear.


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

PS4, Line 460: msTbl.getTableType().equals(TableType.EXTERNAL_TABLE.toString());
> we shuold probably compare case insensitive to be safe
Done


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS4, Line 1147: occurrs
> occurs
Done


PS4, Line 1482:       } catch (Exception e) {
              :         try {
              :           // Error creating the table in HMS, drop the managed 
table from Kudu.
              :           if (!Table.isExternalTable(newTable)) {
              :             KuduCatalogOpExecutor.dropTable(newTable, false);
              :           }
              :         } catch (Exception logged) {
              :           String kuduTableName = 
newTable.getParameters().get(KuduTable.KEY_TABLE_NAME);
              :           LOG.error(String.format("Failed to drop Kudu table 
'%s'", kuduTableName),
              :               logged);
              :           throw new RuntimeException(String.format("Failed to 
create the table '%s' in " +
              :               " the Metastore and the newly created Kudu table 
'%s' could not be " +
              :               " dropped. The log contains more information.", 
newTable.getTableName(),
              :               kuduTableName), e);
              :         }
              :         if (e instanceof AlreadyExistsException && 
params.if_not_exists) return false;
              :         throw new ImpalaRuntimeException(
              :             String.format(HMS_RPC_ERROR_FORMAT_STR, 
"createTable"), e);
> it looks like none of this really needs to be inside the synchronized block
Done


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

PS4, Line 232:     if (!req.is_delta) {
             :       catalog = new ImpaladCatalog(defaultKuduMasterAddrs_);
             :     }
> 1line
Done


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java:

PS4, Line 135:       if (!hasRangePartitioning) {
             :         
tableOpts.setRangePartitionColumns(Collections.<String>emptyList());
             :       }
> I don't think this is necessary
Unfortunately it is. I spoke to Dan (from Kudu team) about it. If the user 
doesn't specify a range partitioning, Kudu by default creates one with all the 
primary key columns. So, the distribute params we get from Kudu (and use in the 
SHOW stmt) is different from the distribute params that the user specified. I 
added a comment to clarify this. Let me know if this is ok.


PS4, Line 175: erros
> errors
Done


PS4, Line 192: cols.clear();
> can you indicate in the comment that this doesn't just populate msTbl's col
Done


PS4, Line 206: new KuduClient
> I'm not crazy about this wrapper class thing. It's only used in this file.
Done


PS4, Line 212: is accessible
> exists
Done


PS4, Line 215: validateTblProperties
> how about validateKuduTblExists ?
Done


PS4, Line 224: Error accessing table in Kudu " +
             :           "master '%s'
> This could also print the name. Also to avoid confusing with potential futu
Done


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/util/KuduClient.java
File fe/src/main/java/org/apache/impala/util/KuduClient.java:

> as I've said I'd vote to remove this, it's only used by 1 class and adds ex
Done


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 1353:         "functional.alltypestiny", "Columns cannot be specified with 
an external " +
> odd error message. i would expect the 'as select' to be the offending part.
Yeah, you're right. Fixed it.


Line 1720:     AnalyzesOk("create table tab (x int, y int, primary key (X)) " +
> i thought kudu is case-sensitive
We lowercase the pk columns during the analysis. Isn't that ok?


http://gerrit.cloudera.org:8080/#/c/4414/4/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test:

Line 6: as select * from functional.alltypestiny
> shouldn't this be part of an analyzer test?
It used to be that many of these cases were handled during the analysis. Both 
MJ and Alex suggested we avoid performing checks that are already performed in 
Kudu (e.g. no boolean primary key columns). Hence, many of these cases are 
essentially analysis tests that are caught at runtime. Let me know if you 
prefer to move these back to the analysis. The only issue with this would be 
keeping these checks consistent with Kudu.


Line 30:   distribute by hash (x) into 3 buckets stored as kudu
> same here, and for the other analysis error test cases in this file
See comment above.


Line 32: NonRecoverableException: Key column may not have type of BOOL, FLOAT, 
or DOUBLE
> why wouldn't this be an analysis exception?
See comment above.


Line 46: NonRecoverableException: Got out-of-order key column: name: "y" type: 
INT32 is_key: true is_nullable: false cfile_block_size: 0
> inscrutable error message
This comes from Kudu. I agree it is not user friendly. I'll file the Kudu team 
to fix this.


Line 53: NonRecoverableException: must have at least two hash buckets
> error message should point out the offending clause
Error message comes from Kudu. That's the drawback of not doing these checks in 
the analysis. We don't control the error messages :(


Line 60: NonRecoverableException: hash bucket schema components must not 
contain columns in common
> same here
Same comment as above. I understand this is annoying. The goal is to have the 
Kudu team fix these error msgs.


http://gerrit.cloudera.org:8080/#/c/4414/4/testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test:

Line 1: ====
> might be a good idea to point out at the top that this test contains test c
Good idea. Done


Line 5:   DISTRIBUTE BY RANGE SPLIT ROWS ((10), (30)) STORED AS KUDU
> analyzer test?
Done


http://gerrit.cloudera.org:8080/#/c/4414/4/testdata/workloads/functional-query/queries/QueryTest/kudu_partition_ddl.test
File 
testdata/workloads/functional-query/queries/QueryTest/kudu_partition_ddl.test:

Line 6:   DISTRIBUTE BY RANGE(name) SPLIT ROWS (('abc')) STORED AS KUDU
> analyzer test?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Michael Brown <[email protected]>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

Reply via email to