[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Internal Jenkins has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 13: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables With this commit we simplify the syntax and handling of CREATE TABLE statements for both managed and external Kudu tables. Syntax example: CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b)) DISTRIBUTE BY HASH (a) INTO 3 BUCKETS, RANGE (b) SPLIT ROWS (('abc', 'def')) STORED AS KUDU Changes: 1) Remove the requirement to specify table properties such as key columns in tblproperties. 2) Read table schema (column definitions, primary keys, and distribution schemes) from Kudu instead of the HMS. 3) For external tables, the Kudu table is now required to exist at the time of creation in Impala. 4) Disallow table properties that could conflict with an existing table. Ex: key_columns cannot be specified. 5) Add KUDU as a file format. 6) Add a startup flag to impalad to specify the default Kudu master addresses. The flag is used as the default value for the table property kudu_master_addresses but it can still be overriden using TBLPROPERTIES. 7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE wasn't implemented for Kudu tables and silently ignored. The Kudu tables wouldn't be removed in Kudu. 8) Remove DDL delegates. There was only one functional delegate (for Kudu) the existence of the other delegate and the use of delegates in general has led to confusion. The Kudu delegate only exists to provide functionality missing from Hive. 9) Add PRIMARY KEY at the column and table level. This syntax is fairly standard. When used at the column level, only one column can be marked as a key. When used at the table level, multiple columns can be used as a key. Only Kudu tables are allowed to use PRIMARY KEY. The old "kudu.key_columns" table property is no longer accepted though it is still used internally. "PRIMARY" is now a keyword. The ident style declaration is used for "KEY" because it is also used for nested map types. 10) For managed tables, infer a Kudu table name if none was given. The table property "kudu.table_name" is optional for managed tables and is required for external tables. If for a managed table a Kudu table name is not provided, a table name will be generated based on the HMS database and table name. 11) Use Kudu master as the source of truth for table metadata instead of HMS when a table is loaded or refreshed. Table/column metadata are cached in the catalog and are stored in HMS in order to be able to use table and column statistics. Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Reviewed-on: http://gerrit.cloudera.org:8080/4414 Reviewed-by: Alex Behm Tested-by: Internal Jenkins --- M be/src/service/frontend.cc M bin/start-catalogd.sh M bin/start-impala-cluster.py M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java A fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/Type.java D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniCata
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Alex Behm has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 13: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Hello Michael Brown, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4414 to look at the new patch set (#13). Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables With this commit we simplify the syntax and handling of CREATE TABLE statements for both managed and external Kudu tables. Syntax example: CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b)) DISTRIBUTE BY HASH (a) INTO 3 BUCKETS, RANGE (b) SPLIT ROWS (('abc', 'def')) STORED AS KUDU Changes: 1) Remove the requirement to specify table properties such as key columns in tblproperties. 2) Read table schema (column definitions, primary keys, and distribution schemes) from Kudu instead of the HMS. 3) For external tables, the Kudu table is now required to exist at the time of creation in Impala. 4) Disallow table properties that could conflict with an existing table. Ex: key_columns cannot be specified. 5) Add KUDU as a file format. 6) Add a startup flag to impalad to specify the default Kudu master addresses. The flag is used as the default value for the table property kudu_master_addresses but it can still be overriden using TBLPROPERTIES. 7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE wasn't implemented for Kudu tables and silently ignored. The Kudu tables wouldn't be removed in Kudu. 8) Remove DDL delegates. There was only one functional delegate (for Kudu) the existence of the other delegate and the use of delegates in general has led to confusion. The Kudu delegate only exists to provide functionality missing from Hive. 9) Add PRIMARY KEY at the column and table level. This syntax is fairly standard. When used at the column level, only one column can be marked as a key. When used at the table level, multiple columns can be used as a key. Only Kudu tables are allowed to use PRIMARY KEY. The old "kudu.key_columns" table property is no longer accepted though it is still used internally. "PRIMARY" is now a keyword. The ident style declaration is used for "KEY" because it is also used for nested map types. 10) For managed tables, infer a Kudu table name if none was given. The table property "kudu.table_name" is optional for managed tables and is required for external tables. If for a managed table a Kudu table name is not provided, a table name will be generated based on the HMS database and table name. 11) Use Kudu master as the source of truth for table metadata instead of HMS when a table is loaded or refreshed. Table/column metadata are cached in the catalog and are stored in HMS in order to be able to use table and column statistics. Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 --- M be/src/service/frontend.cc M bin/start-catalogd.sh M bin/start-impala-cluster.py M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java A fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/Type.java D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniCatalog
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 12: (2 comments) http://gerrit.cloudera.org:8080/#/c/4414/12/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: Line 280: if (options_.fileFormat == THdfsFileFormat.KUDU) { > This doesn't seem right. We used to be able to change the ROW FORMAT for TE As we talked offline, I reverted this function to what used to be with the addition of an extra check for Kudu tables that don't support ROW FORMAT. Done http://gerrit.cloudera.org:8080/#/c/4414/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: Line 1209: org.apache.hadoop.hive.metastore.api.Table msTbl = existingTbl.getMetaStoreTable(); > Should we only do this if existingTbl is not loaded? As we talked earlier, this handles the most common cases. If an HMS table is dropped externally, then we have no way of deleting the underlying Kudu table as the needed information is stored in the tblproperties of msTbl. -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Alex Behm has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 12: (2 comments) http://gerrit.cloudera.org:8080/#/c/4414/12/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: Line 280: if (options_.fileFormat == THdfsFileFormat.KUDU) { This doesn't seem right. We used to be able to change the ROW FORMAT for TEXT and SEQUENCE. What was wrong with the previous code? http://gerrit.cloudera.org:8080/#/c/4414/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: Line 1209: org.apache.hadoop.hive.metastore.api.Table msTbl = existingTbl.getMetaStoreTable(); Should we only do this if existingTbl is not loaded? Some possible inconsistency issues that we should probably figure out sometime: The HMS table may have already been dropped, and then we'll not drop the corresponding Kudu table. -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 12: Code-Review+1 Fixed some test loading issues and modified the dropTable() to properly handle missing Kudu tables. Running exhaustive tests. -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Hello Michael Brown, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4414 to look at the new patch set (#12). Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables With this commit we simplify the syntax and handling of CREATE TABLE statements for both managed and external Kudu tables. Syntax example: CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b)) DISTRIBUTE BY HASH (a) INTO 3 BUCKETS, RANGE (b) SPLIT ROWS (('abc', 'def')) STORED AS KUDU Changes: 1) Remove the requirement to specify table properties such as key columns in tblproperties. 2) Read table schema (column definitions, primary keys, and distribution schemes) from Kudu instead of the HMS. 3) For external tables, the Kudu table is now required to exist at the time of creation in Impala. 4) Disallow table properties that could conflict with an existing table. Ex: key_columns cannot be specified. 5) Add KUDU as a file format. 6) Add a startup flag to impalad to specify the default Kudu master addresses. The flag is used as the default value for the table property kudu_master_addresses but it can still be overriden using TBLPROPERTIES. 7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE wasn't implemented for Kudu tables and silently ignored. The Kudu tables wouldn't be removed in Kudu. 8) Remove DDL delegates. There was only one functional delegate (for Kudu) the existence of the other delegate and the use of delegates in general has led to confusion. The Kudu delegate only exists to provide functionality missing from Hive. 9) Add PRIMARY KEY at the column and table level. This syntax is fairly standard. When used at the column level, only one column can be marked as a key. When used at the table level, multiple columns can be used as a key. Only Kudu tables are allowed to use PRIMARY KEY. The old "kudu.key_columns" table property is no longer accepted though it is still used internally. "PRIMARY" is now a keyword. The ident style declaration is used for "KEY" because it is also used for nested map types. 10) For managed tables, infer a Kudu table name if none was given. The table property "kudu.table_name" is optional for managed tables and is required for external tables. If for a managed table a Kudu table name is not provided, a table name will be generated based on the HMS database and table name. 11) Use Kudu master as the source of truth for table metadata instead of HMS when a table is loaded or refreshed. Table/column metadata are cached in the catalog and are stored in HMS in order to be able to use table and column statistics. Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 --- M be/src/service/frontend.cc M bin/start-catalogd.sh M bin/start-impala-cluster.py M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java A fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/Type.java D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniCatalog
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Alex Behm has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 11: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Hello Michael Brown, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4414 to look at the new patch set (#11). Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables With this commit we simplify the syntax and handling of CREATE TABLE statements for both managed and external Kudu tables. Syntax example: CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b)) DISTRIBUTE BY HASH (a) INTO 3 BUCKETS, RANGE (b) SPLIT ROWS (('abc', 'def')) STORED AS KUDU Changes: 1) Remove the requirement to specify table properties such as key columns in tblproperties. 2) Read table schema (column definitions, primary keys, and distribution schemes) from Kudu instead of the HMS. 3) For external tables, the Kudu table is now required to exist at the time of creation in Impala. 4) Disallow table properties that could conflict with an existing table. Ex: key_columns cannot be specified. 5) Add KUDU as a file format. 6) Add a startup flag to impalad to specify the default Kudu master addresses. The flag is used as the default value for the table property kudu_master_addresses but it can still be overriden using TBLPROPERTIES. 7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE wasn't implemented for Kudu tables and silently ignored. The Kudu tables wouldn't be removed in Kudu. 8) Remove DDL delegates. There was only one functional delegate (for Kudu) the existence of the other delegate and the use of delegates in general has led to confusion. The Kudu delegate only exists to provide functionality missing from Hive. 9) Add PRIMARY KEY at the column and table level. This syntax is fairly standard. When used at the column level, only one column can be marked as a key. When used at the table level, multiple columns can be used as a key. Only Kudu tables are allowed to use PRIMARY KEY. The old "kudu.key_columns" table property is no longer accepted though it is still used internally. "PRIMARY" is now a keyword. The ident style declaration is used for "KEY" because it is also used for nested map types. 10) For managed tables, infer a Kudu table name if none was given. The table property "kudu.table_name" is optional for managed tables and is required for external tables. If for a managed table a Kudu table name is not provided, a table name will be generated based on the HMS database and table name. 11) Use Kudu master as the source of truth for table metadata instead of HMS when a table is loaded or refreshed. Table/column metadata are cached in the catalog and are stored in HMS in order to be able to use table and column statistics. Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 --- M be/src/service/frontend.cc M bin/start-catalogd.sh M bin/start-impala-cluster.py M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java A fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/Type.java D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/4414/7/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: Line 358: 1: required list values > Why add a new special purpose structure that is redundant with TExpr? To me If that's ok, I'll defer that for now. There are a bunch of stuff in KuduUtil related to split rows that make use of TRangeLiteral which will btw change in the range partitioning patch. So, I'll fix it there for good. Added a TODO. -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Hello Michael Brown, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4414 to look at the new patch set (#10). Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables With this commit we simplify the syntax and handling of CREATE TABLE statements for both managed and external Kudu tables. Syntax example: CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b)) DISTRIBUTE BY HASH (a) INTO 3 BUCKETS, RANGE (b) SPLIT ROWS (('abc', 'def')) STORED AS KUDU Changes: 1) Remove the requirement to specify table properties such as key columns in tblproperties. 2) Read table schema (column definitions, primary keys, and distribution schemes) from Kudu instead of the HMS. 3) For external tables, the Kudu table is now required to exist at the time of creation in Impala. 4) Disallow table properties that could conflict with an existing table. Ex: key_columns cannot be specified. 5) Add KUDU as a file format. 6) Add a startup flag to impalad to specify the default Kudu master addresses. The flag is used as the default value for the table property kudu_master_addresses but it can still be overriden using TBLPROPERTIES. 7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE wasn't implemented for Kudu tables and silently ignored. The Kudu tables wouldn't be removed in Kudu. 8) Remove DDL delegates. There was only one functional delegate (for Kudu) the existence of the other delegate and the use of delegates in general has led to confusion. The Kudu delegate only exists to provide functionality missing from Hive. 9) Add PRIMARY KEY at the column and table level. This syntax is fairly standard. When used at the column level, only one column can be marked as a key. When used at the table level, multiple columns can be used as a key. Only Kudu tables are allowed to use PRIMARY KEY. The old "kudu.key_columns" table property is no longer accepted though it is still used internally. "PRIMARY" is now a keyword. The ident style declaration is used for "KEY" because it is also used for nested map types. 10) For managed tables, infer a Kudu table name if none was given. The table property "kudu.table_name" is optional for managed tables and is required for external tables. If for a managed table a Kudu table name is not provided, a table name will be generated based on the HMS database and table name. 11) Use Kudu master as the source of truth for table metadata instead of HMS when a table is loaded or refreshed. Table/column metadata are cached in the catalog and are stored in HMS in order to be able to use table and column statistics. Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 --- M be/src/service/frontend.cc M bin/start-catalogd.sh M bin/start-impala-cluster.py M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java A fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/Type.java D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 9: Code-Review+1 (9 comments) Carry mikeb's +1 for python tests. http://gerrit.cloudera.org:8080/#/c/4414/7/be/src/service/frontend.cc File be/src/service/frontend.cc: Line 62: "value should be a comma separated list of hostnames or IP addresses."); > Add that detail to the description. Done http://gerrit.cloudera.org:8080/#/c/4414/9/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: Line 1047: // Used for creating tables where the schema is inferred externally, e.g., from an Avro > whitespace Done http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java: Line 105: for (String colName: colNames_) { > I was wondering whether we should allow something like DISTRIBUTED BY(a, a, Actually this case is rejected by Kudu. Added a test in kudu_create.test. http://gerrit.cloudera.org:8080/#/c/4414/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 1350: // CTAS in an external Kudu table > I was thinking it would mean the same thing it does for HDFS tables. After I see. Let's defer this to a follow up patch if you don't mind. Besides there are other CREATE TABLE statements that we need to revisit for Kudu such as CREATE TABLE LIKE. As is right now, this change would require non-substantial changes to the way we analyze and execute this statement. Filed a JIRA and added a TODO. http://gerrit.cloudera.org:8080/#/c/4414/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 1719: // bucket is partitioned into 4 tables based on the split points of 'y'. > tables == tablets? :) yes. Done http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: Line 203: if row_format and file_format == 'text': > There are a few examples here: That's fine. I changed it to be on the safe side. http://gerrit.cloudera.org:8080/#/c/4414/9/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test: Line 82: create table `add`(`analytic` int primary key) distribute by hash (`analytic`) > also cover the range() clause in this same test Done http://gerrit.cloudera.org:8080/#/c/4414/7/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 204: """ > File JIRA and defer. Done KUDU-1711 Line 213: ) > Thanks for the explanation, seems fine. Done -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Michael Brown has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 9: Code-Review+1 for the python tests -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Alex Behm has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 9: (3 comments) I'll wait for the next change set and then do a final round over everything. http://gerrit.cloudera.org:8080/#/c/4414/9/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: Line 1047: // Used for creating tables where the schema is inferred externally, e.g., from an Avro whitespace http://gerrit.cloudera.org:8080/#/c/4414/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 1719: // bucket is partitioned into 4 tables based on the split points of 'y'. tables == tablets? http://gerrit.cloudera.org:8080/#/c/4414/9/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test: Line 82: create table `add`(`analytic` int primary key) distribute by hash (`analytic`) also cover the range() clause in this same test -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Alex Behm has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 7: (10 comments) Responses to comments. Doing another round now. http://gerrit.cloudera.org:8080/#/c/4414/7/be/src/service/frontend.cc File be/src/service/frontend.cc: Line 62: "value should be a comma separated list of hostnames or IP addresses."); > Ports are optional. Add that detail to the description. http://gerrit.cloudera.org:8080/#/c/4414/7/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: Line 358: 1: required list values > Hm, TExpr? That sounds a bit too much for simple literal values unless we e Why add a new special purpose structure that is redundant with TExpr? To me a TExpr is pretty clear, but without looking at the definition, I don't really know what a TRangeLiteral is. http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java: Line 105: for (String colName: colNames_) { > Not sure I follow. Are we doing something to prevent this? I was wondering whether we should allow something like DISTRIBUTED BY(a, a, a, a, a) or whether we should reject that because 'a' is mentioned multiple times. We reject such cases for PKs. http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java: Line 231: } catch (Exception e) { > I don't think we can differentiate these two given the way the Kudu client Got it, thanks! http://gerrit.cloudera.org:8080/#/c/4414/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 1350: // CTAS in an external Kudu table > Hm, what does it really mean to allow CTAS on external Kudu table? We have I was thinking it would mean the same thing it does for HDFS tables. After the CTAS we have a new external table with a schema according to the query statement. The fact that it's a Kudu and not an HDFS table seems like an insignificant detail from a user's point of view. In other words, at a high-level it's not very clear to me why Kudu would behave any different than HDFS. I get that the implementation may be weird. Let's revisit in a follow-on change, if necessary. Line 1739: AnalysisError(String.format("create table tab (x int) distribute by hash (x) " + > Well these were never specified in TBLPROPERTIES so I am not sure if we sho I see. Most of the metadata is in Kudu now. Nice! http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: Line 203: if row_format and file_format == 'text': > Hm, I couldn't find anything in the docs or in our tests. There are a few examples here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL No idea whether we'll actually do the right thing http://gerrit.cloudera.org:8080/#/c/4414/7/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 204: CREATE TABLE {table} (c INT PRIMARY KEY) > That's interesting. Actually, currently don't work because we can't persist File JIRA and defer. Line 213: TBLPROPERTIES ('kudu.master_addresses'='{kudu_addr}')""".format( > Good question. If the user doesn't specify the kudu master_addresses we pop Thanks for the explanation, seems fine. Line 226: DISTRIBUTE BY HASH (c) INTO 3 BUCKETS, RANGE (c) SPLIT ROWS (...) > No, but Kudu's API doesn't currently return any information about range par Got it, thanks. -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Dimitris Tsirogiannis has uploaded a new patch set (#9). Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables With this commit we simplify the syntax and handling of CREATE TABLE statements for both managed and external Kudu tables. Syntax example: CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b)) DISTRIBUTE BY HASH (a) INTO 3 BUCKETS, RANGE (b) SPLIT ROWS (('abc', 'def')) STORED AS KUDU Changes: 1) Remove the requirement to specify table properties such as key columns in tblproperties. 2) Read table schema (column definitions, primary keys, and distribution schemes) from Kudu instead of the HMS. 3) For external tables, the Kudu table is now required to exist at the time of creation in Impala. 4) Disallow table properties that could conflict with an existing table. Ex: key_columns cannot be specified. 5) Add KUDU as a file format. 6) Add a startup flag to impalad to specify the default Kudu master addresses. The flag is used as the default value for the table property kudu_master_addresses but it can still be overriden using TBLPROPERTIES. 7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE wasn't implemented for Kudu tables and silently ignored. The Kudu tables wouldn't be removed in Kudu. 8) Remove DDL delegates. There was only one functional delegate (for Kudu) the existence of the other delegate and the use of delegates in general has led to confusion. The Kudu delegate only exists to provide functionality missing from Hive. 9) Add PRIMARY KEY at the column and table level. This syntax is fairly standard. When used at the column level, only one column can be marked as a key. When used at the table level, multiple columns can be used as a key. Only Kudu tables are allowed to use PRIMARY KEY. The old "kudu.key_columns" table property is no longer accepted though it is still used internally. "PRIMARY" is now a keyword. The ident style declaration is used for "KEY" because it is also used for nested map types. 10) For managed tables, infer a Kudu table name if none was given. The table property "kudu.table_name" is optional for managed tables and is required for external tables. If for a managed table a Kudu table name is not provided, a table name will be generated based on the HMS database and table name. 11) Use Kudu master as the source of truth for table metadata instead of HMS when a table is loaded or refreshed. Table/column metadata are cached in the catalog and are stored in HMS in order to be able to use table and column statistics. Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 --- M be/src/service/frontend.cc M bin/start-catalogd.sh M bin/start-impala-cluster.py M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java A fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/Type.java D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java A fe/src/main/java/org/apache/impala/
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 5: (4 comments) A few minor clarifications. http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: Line 195: msClient.alter_table(msTable_.getDbName(), msTable_.getTableName(), msTable_); > Right, most of the time there will be no changes. I'm not only worried abou In theory, it should be ok given that there is a single write path to HMS for Kudu table schema. If that changes, I agree we may have to deal with some weird inconsistent states. Line 216: String colName = ToSqlUtils.getIdentSql(colSchema.getName()); > We don't parse the column names in the HMS backend table and neither does H Yeah, you're right. Fixed it in a follow up patch. http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: Line 1151: // the metadata must be fetched from the Hive Metastore. > Makes sense now that you've explained, please leave a brief comment. Done http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java: Line 85: kudu.createTable(kuduTableName, schema, tableOpts); > Agree. Can you file a JIRA against Kudu? Thanks! Done -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 7: (63 comments) Still a few minor asks to go through (e.g. add JIRA numbers and few clarifications) but I am sending the next patch. http://gerrit.cloudera.org:8080/#/c/4414/7/be/src/service/frontend.cc File be/src/service/frontend.cc: Line 62: "value should be a comma separated list of hostnames or IP addresses."); > are ports optional or mandatory? Ports are optional. http://gerrit.cloudera.org:8080/#/c/4414/7/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: Line 358: 1: required list values > Why not a list of TExpr that are expected to be literals? Seems more future Hm, TExpr? That sounds a bit too much for simple literal values unless we envision using exprs, udfs, etc. Thoughts? Line 368: // the type parameter. > which type parameter? Sorry, there was an enum that got removed and I forgot to update the comments. Done http://gerrit.cloudera.org:8080/#/c/4414/7/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: Line 398: 14: optional list distribute_by; > for consistency let's remove trailing ";" Done http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: Line 1033: // class doesn't inherit from CreateTableStmt. > whitespace Done Line 1047: // Used for creating external Kudu tables for which the schema is loaded from Kudu. > There seem to be more uses of this production, so this comment could be mis Done Line 1112: // or one RANGE clause > typo: clauses Done http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java File fe/src/main/java/org/apache/impala/analysis/ColumnDef.java: Line 93: void setIsPrimaryKey() { isPrimaryKey_ = true; } > do we need this? No, removed. Line 191: Preconditions.checkState(!colDefsByColName.containsKey(colDef.getColName())); > can check return value of put() Good point. Keep forgetting this. Done http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: Line 208:* Analyzes and checks table properties which are common for both managed and external > typo: common to Done Line 255: "PARTITIONED BY cannot be used with an Kudu table."); > typo: a Kudu table Done Line 273: AnalysisUtils.throwIfNullOrEmpty(getPrimaryKeyColumnDefs(), > Shouldn't this check hasPrimaryKey()? Done Line 284: "zero. Given number of replicas is: " + r.toString() + ".'"); > remove trailing .' or add the opening single-quote Done Line 318: private boolean hasPrimaryKey() { > Isn't it enough to check primaryKeyColDefs_ in tableDef_? You're right, we can do that since tableDef_ is analyzed. http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java: Line 105: for (String colName: colNames_) { > we could specify the same distribution column multiple times Not sure I follow. Are we doing something to prevent this? Line 127: throw new AnalysisException("Split values cannot be NULL"); > do we have a test for this? Hm no, added one. Thanks Line 223: colNames_.addAll(colNames); > do we need toLower()? We've already lowered the column names before the call to setColumnNames(). http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: Line 69: // Populated during analysis. > Authoritative list of primary key column definitions populated during analy Done Line 177: fqTableName_.analyze(); > Do you know if Kudu has more permissive or more restrictive constraints on Agreed. Will do. Line 181: if (analyzer.dbContainsTable(getTblName().getDb(), getTbl(), Privilege.CREATE) > Are we going to check Sentry privs for Kudu tables? Also ok to defer this f It's still being discussed :). Lineage as well. Line 220:* Analyzes the primary key columns. Primary keys are only supported for Kudu > Replace the second sentence with a brief description what this checks. It d Done Line 234: StringBuilder columnDefStr = new StringBuilder(); > Not used? Done http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: Line 1117: if (db != null && params.cascade) dropTablesFromKudu(db); > I think it might be a good idea to do this under the metastoreDdlLock_ as w Done http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/service/KuduCatalogO
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Dimitris Tsirogiannis has uploaded a new patch set (#8). Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables With this commit we simplify the syntax and handling of CREATE TABLE statements for both managed and external Kudu tables. Syntax example: CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b)) DISTRIBUTE BY HASH (a) INTO 3 BUCKETS, RANGE (b) SPLIT ROWS (('abc', 'def')) STORED AS KUDU Changes: 1) Remove the requirement to specify table properties such as key columns in tblproperties. 2) Read table schema (column definitions, primary keys, and distribution schemes) from Kudu instead of the HMS. 3) For external tables, the Kudu table is now required to exist at the time of creation in Impala. 4) Disallow table properties that could conflict with an existing table. Ex: key_columns cannot be specified. 5) Add KUDU as a file format. 6) Add a startup flag to impalad to specify the default Kudu master addresses. The flag is used as the default value for the table property kudu_master_addresses but it can still be overriden using TBLPROPERTIES. 7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE wasn't implemented for Kudu tables and silently ignored. The Kudu tables wouldn't be removed in Kudu. 8) Remove DDL delegates. There was only one functional delegate (for Kudu) the existence of the other delegate and the use of delegates in general has led to confusion. The Kudu delegate only exists to provide functionality missing from Hive. 9) Add PRIMARY KEY at the column and table level. This syntax is fairly standard. When used at the column level, only one column can be marked as a key. When used at the table level, multiple columns can be used as a key. Only Kudu tables are allowed to use PRIMARY KEY. The old "kudu.key_columns" table property is no longer accepted though it is still used internally. "PRIMARY" is now a keyword. The ident style declaration is used for "KEY" because it is also used for nested map types. 10) For managed tables, infer a Kudu table name if none was given. The table property "kudu.table_name" is optional for managed tables and is required for external tables. If for a managed table a Kudu table name is not provided, a table name will be generated based on the HMS database and table name. 11) Use Kudu master as the source of truth for table metadata instead of HMS when a table is loaded or refreshed. Table/column metadata are cached in the catalog and are stored in HMS in order to be able to use table and column statistics. Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 --- M be/src/service/frontend.cc M bin/start-catalogd.sh M bin/start-impala-cluster.py M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java A fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/Type.java D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java A fe/src/main/java/org/apache/impala/
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Alex Behm has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 7: (26 comments) Next round over the code. http://gerrit.cloudera.org:8080/#/c/4414/7/be/src/service/frontend.cc File be/src/service/frontend.cc: Line 62: "value should be a comma separated list of hostnames or IP addresses."); are ports optional or mandatory? http://gerrit.cloudera.org:8080/#/c/4414/7/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: Line 358: 1: required list values Why not a list of TExpr that are expected to be literals? Seems more future proof. Line 368: // the type parameter. which type parameter? http://gerrit.cloudera.org:8080/#/c/4414/7/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: Line 398: 14: optional list distribute_by; for consistency let's remove trailing ";" http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: Line 1033: // class doesn't inherit from CreateTableStmt. whitespace Line 1047: // Used for creating external Kudu tables for which the schema is loaded from Kudu. There seem to be more uses of this production, so this comment could be misleading. Maybe generalize to something like "Used for creating tables where the schema is inferred externally, e.g., from an Avro schema, Kudu table or query statement." Line 1112: // or one RANGE clause typo: clauses http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java File fe/src/main/java/org/apache/impala/analysis/ColumnDef.java: Line 93: void setIsPrimaryKey() { isPrimaryKey_ = true; } do we need this? Line 191: Preconditions.checkState(!colDefsByColName.containsKey(colDef.getColName())); can check return value of put() http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: Line 208:* Analyzes and checks table properties which are common for both managed and external typo: common to Line 255: "PARTITIONED BY cannot be used with an Kudu table."); typo: a Kudu table this also needs to be checked for managed tables right? Line 273: AnalysisUtils.throwIfNullOrEmpty(getPrimaryKeyColumnDefs(), Shouldn't this check hasPrimaryKey()? Line 284: "zero. Given number of replicas is: " + r.toString() + ".'"); remove trailing .' or add the opening single-quote Line 318: private boolean hasPrimaryKey() { Isn't it enough to check primaryKeyColDefs_ in tableDef_? http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java: Line 105: for (String colName: colNames_) { we could specify the same distribution column multiple times Line 127: throw new AnalysisException("Split values cannot be NULL"); do we have a test for this? Line 223: colNames_.addAll(colNames); do we need toLower()? http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: Line 69: // Populated during analysis. Authoritative list of primary key column definitions populated during analysis. Line 177: fqTableName_.analyze(); Do you know if Kudu has more permissive or more restrictive constraints on what strings can be used as table/column names? I'd be surprised if HMS and Kudu were identical in that respect. Better to file a JIRA and leave that investigation/fix out of this patch. Line 181: if (analyzer.dbContainsTable(getTblName().getDb(), getTbl(), Privilege.CREATE) Are we going to check Sentry privs for Kudu tables? Also ok to defer this fix, but let's not forget. Line 220:* Analyzes the primary key columns. Primary keys are only supported for Kudu Replace the second sentence with a brief description what this checks. It does not check the format and succeeds if no primary keys are given, so there is nothing Kudu specific here (that is checked in CreateTableStmt). Line 234: StringBuilder columnDefStr = new StringBuilder(); Not used? http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: Line 1117: if (db != null && params.cascade) dropTablesFromKudu(db); I think it might be a good idea to do this under the metastoreDdlLock_ as well to ensure that the Kudu and HMS table deletions are atomic. http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java: Line 47: /*
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Alex Behm has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 5: (8 comments) Responses to comments. Starting next round on code. http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: Line 195: msClient.alter_table(msTable_.getDbName(), msTable_.getTableName(), msTable_); > Good point. We can potentially avoid this by checking if there has been som Right, most of the time there will be no changes. I'm not only worried about the perf, but all the extra weird states we can be in when doing this extra RPC to modify the HMS. Line 216: String colName = ToSqlUtils.getIdentSql(colSchema.getName()); > Hm, can you explain why this is the case given that we will use this name t We don't parse the column names in the HMS backend table and neither does Hive. The backticks are only needed to tell our parser to interpret a token as an identifier, but the parser is never invoked for column names from the HMS. I recommend trying this out with some Impala keywords to see what happens. Line 223: numClusteringCols_ = primaryKeyColumnNames_.size(); > Actually, I removed this and left a TODO. There are a few places that inter Yea, there's definitely some inconsistency here. http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: Line 1151: // the metadata must be fetched from the Hive Metastore. > In order to drop a table from Kudu we need the Kudu table name which is sto Makes sense now that you've explained, please leave a brief comment. Line 1172: if (!KuduTable.isKuduTable(msTable)) continue; > Well, it's an easy check we can do to avoid RPC calls to Kudu for dropping Got it, thanks! http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java: Line 85: kudu.createTable(kuduTableName, schema, tableOpts); > I see what you're saying but I don't think this will work. We need to be ab Agree. Can you file a JIRA against Kudu? Thanks! Line 178: if (kudu.tableExists(tableName)) { > I get it. See my comment above and let me know what you think. I agree with you. Line 214: String colName = ToSqlUtils.getIdentSql(colSchema.getName()); > I'd appreciate it if you could explain why is it wrong to use this here and Explained in other comment, let me know if still unclear :) -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Alex Behm has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 7: (29 comments) http://gerrit.cloudera.org:8080/#/c/4414/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 1350: // CTAS in an external Kudu table What's the rationale for not allowing this? Line 1716: AnalyzesOk("create table tab (x int, y string, primary key(x, y)) " + Can you add a brief comment explaining the table layout for this one? Line 1717: "distribute by hash(x) into 3 buckets, range(y) split rows " + can we distribute by range, hash? Line 1723: AnalyzesOk("create table tab (a int, b int, c int, d int, primary key (a, b, c))" + What happens if I specify PARTITION BY for a Kudu table? Line 1739: AnalysisError(String.format("create table tab (x int) distribute by hash (x) " + What about specifying the other params in tblproperties like the distribute by and split rows. Line 1748: AnalysisError("create table tab (x int primary key, primary key(x)) stored " + Add negative test where two ColumnDefs are declared as PK. Line 1762: AnalysisError("create table tab (a int, b int, c int, d int, primary key(a, b, c)) " + Do we have tests somewhere for checking which types are supported with Kudu? We should make sure that: * you can create a table with all supported types (and same for the specific clauses like primary key, distributed by, etc.) * you cannot create tables with unsupported types like TIMESTAMP/DECIMAL and nested types (should fail gracefully with "not supported") * or alternatively, if we want to defer the type checks to Kudu (and not bake it into Impala analysis), then we should document that somewhere Line 1772: // No float split keys Even if the column type is float? If so, might want to test that. Line 1810: // DISTRIBUTE BY is required for managed tables. Primary key is also required, do we have a test? Line 1812: "Table partitioning must be specified for managed Kudu tables."); Let's rephrase this as "Table distribution must be specified ..." PS6, Line 1828: AnalysisError("create external table t tblproperties (" + Not that it makes any sense, but hat happens with: create external table t (primary key()) ... Line 1863: public void TestCreateAvroTest() { Add a test with some of the Kudu-specific clauses and STORED AS AVRO Line 2822: public void TestShowFiles() throws AnalysisException { DO we have a TODO/JIRA somewhere to go through all DDL/SHOW commands and make them behave properly for Kudu? http://gerrit.cloudera.org:8080/#/c/4414/6/fe/src/test/java/org/apache/impala/analysis/ParserTest.java File fe/src/test/java/org/apache/impala/analysis/ParserTest.java: Line 1632 Why remove this? It will break my setup :) Line 2235: ParsesOk("CREATE TABLE foo (i INT PRIMARY KEY, PRIMARY KEY(i)) STORED AS KUDU"); Add case like this: CREATE TABLE foo (i INT PRIMARY KEY, j PRIMARY KEY) STORED AS KUDU Line 2541: ParsesOk("CREATE TABLE Foo TBLPROPERTIES ('a'='b', 'c'='d') AS SELECT * from bar"); nit: let's be consistent with how we chop up string literals across lines (space at end of previous line xor beginning of next line) http://gerrit.cloudera.org:8080/#/c/4414/6/fe/src/test/java/org/apache/impala/catalog/HdfsStorageDescriptorTest.java File fe/src/test/java/org/apache/impala/catalog/HdfsStorageDescriptorTest.java: Line 58: "org.apache.impala.hive.serde.ParquetInputFormat", revert http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: Line 203: if row_format and file_format == 'text': I think row_format is also valid for sequence files (maybe we're not using it though) http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: Line 823: distribute by range(id) split rows ((1003), (1007)) stored as kudu; ah so much better http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test File testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test: Line 218: create table kudu_table_clone like functional_kudu.alltypes Why can't we check this in analysis? http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test: Line 3: # This test file contains several cases for what basically amount to analysis errors, Should mention that some of this behavior is intentional and why. Line 11: ImpalaRuntimeException: Type TIMESTAMP is not supported in Kudu Do we get the same message
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 5: (41 comments) http://gerrit.cloudera.org:8080/#/c/4414/5//COMMIT_MSG Commit Message: Line 45:"KEY" is expected to be common enough that the ident style > Might also want to mention that "key" is used for nested map types, so it's Done Line 51:on the HMS database and table name. If the database is "default", > Does this last regarding "default" make the code easier or more complicated Not sure why the special case. Removing it unless someone objects. http://gerrit.cloudera.org:8080/#/c/4414/5/be/src/service/frontend.cc File be/src/service/frontend.cc: Line 61: DEFINE_string(kudu_master_addrs, "", "Specifies the default Kudu master(s). The given " > Can we make this more consistent with out existing options? For example: Done http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java File fe/src/main/java/org/apache/impala/analysis/ColumnDef.java: Line 176: static List toColumnNames(Collection columnDefs) { > colDefs Done Line 184: static Map mapByColumnNames(Collection colDefs) { > comment that colDefs is assumed have no duplicate names, or alternatively d Done http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: Line 99: "does not support (%s) file format. Supported formats are: (%s)", > the (%s) file format. Done Line 100: createStmt_.getFileFormat().toString().replace("_", ""), > what's with this weird "_" replacement? I believe the intention was to make the file format name from THdfsFileFormat consistent with the file format name as specified in a "STORED AS" clause. RC_FILE is RCFILE, TEXT is TEXTFILE. So, as you can tell we aren't very consistent here. That's also the reason for hardcoding the values in L101. Left a TODO in the CatalogObjects.thrift for now if that's ok with you. Line 101: "PARQUET, TEXTFILE, KUDU")); > use SUPPORTED_INSERT_FORMATS instead of hardcoding See comment above. http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: Line 80: private void setColumnDefs(List columnDefs) { > nit: colDefs everywhere Done Line 95: Map getTblProperties() { return tableDef_.getTblProperties(); } > is there any significance to not adding public/private to some of these met Casey started using the default access modifier that essentially gives public access only to classes in the same package. In theory, we should be using the most restrictive access modifier, so I kind of like this. For sure the codebase is not consistent with using this rule, so if you think we shouldn't be doing this I'll make them public. Line 192: "Only Kudu tables can use DISTRIBUTE BY clause."); > the DISTRIBUTE BY clause Done http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: Line 43: * Represents the table parameters in a CREATE TABLE statement. > Let's be more precise and list the clauses that this captures. Done http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: Line 97: // Table name in Kudu storage engine. It may not neccessarily be the same as the > the Kudu storage engine Done Line 101: // 1. For managed tables, 'kuduTableName_' is prefixed with '__IMPALA__' to > Can we make the prefix 'impala::' instead? Sure no prob. Done Line 154:* are loaded from HMS. The function also updates the table schema in HMS. > Add a comment about why we also update the HMS. My understanding is that we Done Line 195: msClient.alter_table(msTable_.getDbName(), msTable_.getTableName(), msTable_); > This means every invalidate/refresh will do an extra alter to HMS. Are we c Good point. We can potentially avoid this by checking if there has been some change in the schema. At this point, I am not so terribly worried about this. I'll keep a note to actually profile it and see if it's worth optimizing that path. Line 202:* Loads the schema from the Kudu table including column definitions and primary key > mention that it loads the schema into this table as well as the HMS table Done Line 216: String colName = ToSqlUtils.getIdentSql(colSchema.getName()); > getIdentSql() may add backticks, I don't think we want that here Hm, can you explain why this is the case given that we will use this name to populate the columns in the HMS table? Line 223: numClus
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Dimitris Tsirogiannis has uploaded a new patch set (#7). Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables With this commit we simplify the syntax and handling of CREATE TABLE statements for both managed and external Kudu tables. Syntax example: CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b)) DISTRIBUTE BY HASH (a) INTO 3 BUCKETS, RANGE (b) SPLIT ROWS (('abc', 'def')) STORED AS KUDU Changes: 1) Remove the requirement to specify table properties such as key columns in tblproperties. 2) Read table schema (column definitions, primary keys, and distribution schemes) from Kudu instead of the HMS. 3) For external tables, the Kudu table is now required to exist at the time of creation in Impala. 4) Disallow table properties that could conflict with an existing table. Ex: key_columns cannot be specified. 5) Add KUDU as a file format. 6) Add a startup flag to impalad to specify the default Kudu master addresses. The flag is used as the default value for the table property kudu_master_addresses but it can still be overriden using TBLPROPERTIES. 7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE wasn't implemented for Kudu tables and silently ignored. The Kudu tables wouldn't be removed in Kudu. 8) Remove DDL delegates. There was only one functional delegate (for Kudu) the existence of the other delegate and the use of delegates in general has led to confusion. The Kudu delegate only exists to provide functionality missing from Hive. 9) Add PRIMARY KEY at the column and table level. This syntax is fairly standard. When used at the column level, only one column can be marked as a key. When used at the table level, multiple columns can be used as a key. Only Kudu tables are allowed to use PRIMARY KEY. The old "kudu.key_columns" table property is no longer accepted though it is still used internally. "PRIMARY" is now a keyword. The ident style declaration is used for "KEY" because it is also used for nested map types. 10) For managed tables, infer a Kudu table name if none was given. The table property "kudu.table_name" is optional for managed tables and is required for external tables. If for a managed table a Kudu table name is not provided, a table name will be generated based on the HMS database and table name. 11) Use Kudu master as the source of truth for table metadata instead of HMS when a table is loaded or refreshed. Table/column metadata are cached in the catalog and are stored in HMS in order to be able to use table and column statistics. Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 --- M be/src/service/frontend.cc M bin/start-catalogd.sh M bin/start-impala-cluster.py M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java A fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/Type.java D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java A fe/src/main/java/org/apache/impala/
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Michael Brown has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 6: (10 comments) http://gerrit.cloudera.org:8080/#/c/4414/6/tests/common/kudu_test_suite.py File tests/common/kudu_test_suite.py: PS6, Line 47: super(KuduTestSuite, cls).setup_class() : if os.environ["KUDU_IS_SUPPORTED"] == "false": : pytest.skip("Kudu is not supported") Maybe move L47 to L50 (switch the ordering of these)? This isn't just a nit: it avoids running super.setup_class(), which is non-trivial, and then just deciding to skip after all. PS6, Line 51: @classmethod : def teardown_class(cls): : pass Not needed: 1. KuduTestSuite.setup_class() doesn't do anything that needs to be torn down here. 2. This overrides super.teardown_class() and doesn't call super.teardown_class(). super.teardown_class() (ImpalaTestSuite.teardown_class()) does do some work. So it seems safe to delete this method completely. http://gerrit.cloudera.org:8080/#/c/4414/6/tests/custom_cluster/test_kudu.py File tests/custom_cluster/test_kudu.py: PS6, Line 20: from kudu.client import INT32 ImportError: I think this needs to be kudu.schema PS6, Line 34: @classmethod : def add_test_dimensions(cls): : super(CustomClusterTestSuite, cls).add_test_dimensions() : cls.TestMatrix.add_constraint(lambda v: : v.get_value('table_format').file_format == 'parquet' and : v.get_value('table_format').compression_codec == 'none') I don't think this is needed: no tests here use a vector. PS6, Line 43: def test_kudu_master_addrs(self, cursor, kudu): kudu_client, not kudu. Note this isn't just a rename; it's to make use of the kudu_client fixture. There is no "kudu" fixture. PS6, Line 45: with self.temp_kudu_table(kudu, [INT32]) as kudu_table: kudu_client PS6, Line 50: props) SyntaxError: missing a closing paren here http://gerrit.cloudera.org:8080/#/c/4414/6/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: PS6, Line 27: from pprint import pprint Remove: unused import PS6, Line 30: from copy import copy : : from tests.beeswax.impala_beeswax import ImpalaBeeswaxException : from tests.common.impala_test_suite import ImpalaTestSuite : from tests.common.skip import SkipIf : from tests.common.test_dimensions import create_uncompressed_text_dimension : from tests.common.test_vector import TestDimension All imported but not used. PS6, Line 49: def test_kudu_scan_node(self, vector): This test is racy in exhaustive: two tests will be competing for the same database (kududb_test). You could play it safe and run it serially, but with a little bit of work, this test can be run in parallel if you 1. use the unique_database fixture L49 2. use_db=unique_database L52 3. Modify QueryTest/kudu-scan-node to access dimtbl using the fully qualified path (functional_kudu.dimtbl) 4. Remove all uses of kududb_test here and in kudu-scan-node.test https://gerrit.cloudera.org/#/c/4169/ shows some of the same principles as an example. -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Alex Behm has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 5: (41 comments) Sending out the first wave. http://gerrit.cloudera.org:8080/#/c/4414/5//COMMIT_MSG Commit Message: Line 45:"KEY" is expected to be common enough that the ident style Might also want to mention that "key" is used for nested map types, so it's not desirable to make it a keyword. Line 51:on the HMS database and table name. If the database is "default", Does this last regarding "default" make the code easier or more complicated? What's the motivation for special casing it, i.e., what's the harm in including the "default" database name? http://gerrit.cloudera.org:8080/#/c/4414/5/be/src/service/frontend.cc File be/src/service/frontend.cc: Line 61: DEFINE_string(kudu_master_addrs, "", "Specifies the default Kudu master(s). The given " Can we make this more consistent with out existing options? For example: -catalog_service_host -state_store_host http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java File fe/src/main/java/org/apache/impala/analysis/ColumnDef.java: Line 176: static List toColumnNames(Collection columnDefs) { colDefs Line 184: static Map mapByColumnNames(Collection colDefs) { comment that colDefs is assumed have no duplicate names, or alternatively deal with that case http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: Line 99: "does not support (%s) file format. Supported formats are: (%s)", the (%s) file format. Line 100: createStmt_.getFileFormat().toString().replace("_", ""), what's with this weird "_" replacement? Line 101: "PARQUET, TEXTFILE, KUDU")); use SUPPORTED_INSERT_FORMATS instead of hardcoding http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: Line 80: private void setColumnDefs(List columnDefs) { nit: colDefs everywhere Line 95: Map getTblProperties() { return tableDef_.getTblProperties(); } is there any significance to not adding public/private to some of these methods? Line 192: "Only Kudu tables can use DISTRIBUTE BY clause."); the DISTRIBUTE BY clause http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: Line 43: * Represents the table parameters in a CREATE TABLE statement. Let's be more precise and list the clauses that this captures. http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: Line 97: // Table name in Kudu storage engine. It may not neccessarily be the same as the the Kudu storage engine Line 101: // 1. For managed tables, 'kuduTableName_' is prefixed with '__IMPALA__' to Can we make the prefix 'impala::' instead? Better to keep the "impala" part lowercase for consistency. It might be a minor detail, but keeping the name readable seems important. I'd imagine that Kudu tables will be accessed from several tools. Line 154:* are loaded from HMS. The function also updates the table schema in HMS. Add a comment about why we also update the HMS. My understanding is that we want to propagate alterations made to the Kudu table to the HMS. Line 195: msClient.alter_table(msTable_.getDbName(), msTable_.getTableName(), msTable_); This means every invalidate/refresh will do an extra alter to HMS. Are we concerned about the extra load? Line 202:* Loads the schema from the Kudu table including column definitions and primary key mention that it loads the schema into this table as well as the HMS table Line 216: String colName = ToSqlUtils.getIdentSql(colSchema.getName()); getIdentSql() may add backticks, I don't think we want that here Line 223: numClusteringCols_ = primaryKeyColumnNames_.size(); Check that there is at least one PK column Line 228: PartitionSchema partitionSchema = kuduTable.getPartitionSchema(); Preconditions.checkState(!colsByPos_.isEmpty()); ? Line 254: public static KuduTable createTmpTable(Db db, createCtasTarget()? Seems more explicit since there really is only one use. http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: Line 1117: if (db != null) dropTablesFromKudu(db); Won't this drop all Kudu tables even if CASCADE is not specified? Line 1151: // the metadata must be fetched from the Hive Metastore. Why do we
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 5: (15 comments) http://gerrit.cloudera.org:8080/#/c/4414/5/bin/start-impala-cluster.py File bin/start-impala-cluster.py: PS5, Line 73: parser.add_option("--kudu_masters", default="127.0.0.1", > Could we import the default value from tests.common instead of hardcoding i Good point. Done http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java: Line 129: if (!org.apache.impala.catalog.Type.areAssignmentCompatibleTypes(colType, > you want to be able to assign the split value to the column type with just Done http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: PS4, Line 226: private void loadDistributeByParams(org.apache.kudu.client.KuduTable kuduTable) { : Preconditions > yes, please leave a comment that you're resetting the mstable cols if there Done PS4, Line 226: private void loadDistributeByParams(org.apache.kudu.client.KuduTable kuduTable) { : Preconditions > I see. yes, a brief comment would be helpful. Done http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: Line 157: public void load(boolean reuseMetadata /* not used */, IMetaStoreClient msClient, > rename reuseMetadata to dummy Good point. Done PS5, Line 193: msTable_.putToParameters(StatsSetupConst.DO_NOT_UPDATE_STATS, : StatsSetupConst.TRUE); > This also needs to be stripped out later for SHOW CREATE TABLE. Done Line 217: cols.add(new FieldSchema(colName, type.toSql().toLowerCase(), null)); > this also updates msTable_ Added a comment to clarify it. Currently, we need to keep the schema in HMS consistent with the one in Kudu in order to use the column statistics. Once we move away from this and Kudu provides us with proper statistics API we will get rid of this annoying dance around msTable_. Line 295: TDistributeByRangeParam rangeParam = param.getBy_range_param(); > checkstate(isset...) Done http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: PS5, Line 1484: // Add the table to the catalog cache : Table newTbl = catalog_.addTable(newTable.getDbName(), newTable.getTableName()); : addTableToCatalogUpdate(newTbl, response.result); > If something here throws (e.g. addTable() looks like it could throw), we pr Actually, addTable should not be declared to throw because none of its operations throw an exception. addTableToCatalogUpdate also doesn't throw, so I believe it's safe to assume that we don't need to rollback the hms change here. Modified the addTable() function in CatalogServiceCatalog. http://gerrit.cloudera.org:8080/#/c/4414/5/tests/common/kudu_test_suite.py File tests/common/kudu_test_suite.py: PS5, Line 38: @SkipIf.kudu_not_supported > This is extremely risky due to bugs in our version of old pytest. See IMPAL Done. Let me know is that's ok now. http://gerrit.cloudera.org:8080/#/c/4414/5/tests/custom_cluster/test_kudu.py File tests/custom_cluster/test_kudu.py: PS5, Line 1: # Copyright (c) 2016 Cloudera, Inc. All rights reserved. > Remove Cloudera copyright and make sure the license text is correct otherwi Done http://gerrit.cloudera.org:8080/#/c/4414/5/tests/metadata/test_ddl.py File tests/metadata/test_ddl.py: PS5, Line 220: > Why remove this skip? Hm, that wasn't intentional. I put it back. PS5, Line 221: > Is this removal safe? Same here. PS5, Line 221: self.expected_exceptions = 2 > I can't see where this is used; delete? Done http://gerrit.cloudera.org:8080/#/c/4414/5/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: PS5, Line 87: print("Describe formatted output:") : pprint(table_desc) > Use LOG.info() Done -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Dimitris Tsirogiannis has uploaded a new patch set (#6). Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables With this commit we simplify the syntax and handling of CREATE TABLE statements for both managed and external Kudu tables. Syntax example: CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b)) DISTRIBUTE BY HASH (a) INTO 3 BUCKETS, RANGE (b) SPLIT ROWS (('abc', 'def')) STORED AS KUDU Changes: 1) Remove the requirement to specify table properties such as key columns in tblproperties. 2) Read table schema (column definitions, primary keys, and distribution schemes) from Kudu instead of the HMS. 3) For external tables, the Kudu table is now required to exist at the time of creation in Impala. 4) Disallow table properties that could conflict with an existing table. Ex: key_columns cannot be specified. 5) Add KUDU as a file format. 6) Add a startup flag to impalad to specify the default Kudu master addresses. The flag is used as the default value for the table property kudu_master_addresses but it can still be overriden using TBLPROPERTIES. 7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE wasn't implemented for Kudu tables and silently ignored. The Kudu tables wouldn't be removed in Kudu. 8) Remove DDL delegates. There was only one functional delegate (for Kudu) the existence of the other delegate and the use of delegates in general has led to confusion. The Kudu delegate only exists to provide functionality missing from Hive. 9) Add PRIMARY KEY at the column and table level. This syntax is fairly standard. When used at the column level, only one column can be marked as a key. When used at the table level, multiple columns can be used as a key. Only Kudu tables are allowed to use PRIMARY KEY. The old "kudu.key_columns" table property is no longer accepted though it is still used internally. "PRIMARY" is now a keyword. "KEY" is expected to be common enough that the ident style declaration is used instead to avoid conflicts. 10) For managed tables, infer a Kudu table name if none was given. The table property "kudu.table_name" is optional for managed tables and is required for external tables. If for a managed table a Kudu table name is not provided, a table name will be generated based on the HMS database and table name. If the database is "default", then a Kudu table name will not include the database. 11) Use Kudu master as the source of truth for table metadata instead of HMS when a table is loaded or refreshed. Table/column metadata are cached in the catalog and are stored in HMS in order to be able to use table and column statistics. Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 --- M be/src/service/frontend.cc M bin/start-catalogd.sh M bin/start-impala-cluster.py M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java A fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/Type.java D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/service/JniFrontend.jav
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: PS5, Line 193: msTable_.putToParameters(StatsSetupConst.DO_NOT_UPDATE_STATS, : StatsSetupConst.TRUE); This also needs to be stripped out later for SHOW CREATE TABLE. -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 5: (5 comments) http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java: Line 129: if (!org.apache.impala.catalog.Type.areAssignmentCompatibleTypes(colType, > Done you want to be able to assign the split value to the column type with just an implicit cast, if any. in other words, for a bigint col it's okay to have a split value of 127, but for a tinyint col, you can't have a split value of maxint. you can call isImplicitlyCastable(exprType, colType). http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: PS4, Line 226: private void loadDistributeByParams(org.apache.kudu.client.KuduTable kuduTable) { : Preconditions > cols is a reference to msTable cols. We clear them here and reload them fro yes, please leave a comment that you're resetting the mstable cols if there are any. i tripped over that as well. http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: Line 157: public void load(boolean reuseMetadata /* not used */, IMetaStoreClient msClient, rename reuseMetadata to dummy Line 217: cols.add(new FieldSchema(colName, type.toSql().toLowerCase(), null)); this also updates msTable_ document this side effect in the function comment. this feels pretty convoluted and is hard to follow. is this really necessary or should we simply ignore the column info stored in mstable? Line 295: TDistributeByRangeParam rangeParam = param.getBy_range_param(); checkstate(isset...) -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: PS4, Line 175: } > Done While it does seem related, does it really mean the same thing? Either way, we have to be sure we're handling it correctly everywhere. E.g. does Table.getClusteringColumns(), getNonClusteringColumns(), work as expected? They also get stored in the thrift TTable differently, see Table.java: // Clustering columns come first. if (i < numClusteringCols_) { table.addToClustering_columns(colDesc); } else { table.addToColumns(colDesc); } Further, Table.getNumClusteringCols() exposes this and it is used in a lot of places. E.g. TupleDescriptor.java: public boolean hasClusteringColsOnly() { Table table = getTable(); if (!(table instanceof HdfsTable) || table.getNumClusteringCols() == 0) return false; ... There are a ton of other places to check. PS4, Line 226: private void loadDistributeByParams(org.apache.kudu.client.KuduTable kuduTable) { : Preconditions > cols is a reference to msTable cols. We clear them here and reload them fro I see. yes, a brief comment would be helpful. http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: PS5, Line 1484: // Add the table to the catalog cache : Table newTbl = catalog_.addTable(newTable.getDbName(), newTable.getTableName()); : addTableToCatalogUpdate(newTbl, response.result); If something here throws (e.g. addTable() looks like it could throw), we probably need to roll back the hiveClient.createTable() call, right? http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java: PS4, Line 135: tableOpts.setRangePartitionColumns( : distParam.getBy_range_param().getColumns()); : > Unfortunately it is. I spoke to Dan (from Kudu team) about it. If the user Thanks, I followed up with an e-mail to Dan to make sure I understand the Kudu behavior. -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Michael Brown has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/4414/5/bin/start-impala-cluster.py File bin/start-impala-cluster.py: PS5, Line 73: parser.add_option("--kudu_masters", default="127.0.0.1", Could we import the default value from tests.common instead of hardcoding it? http://gerrit.cloudera.org:8080/#/c/4414/5/tests/common/kudu_test_suite.py File tests/common/kudu_test_suite.py: PS5, Line 38: @SkipIf.kudu_not_supported This is extremely risky due to bugs in our version of old pytest. See IMPALA-3614 for an example of how this is risky. I think you will have to work around this by skipping in setup_class. http://gerrit.cloudera.org:8080/#/c/4414/5/tests/custom_cluster/test_kudu.py File tests/custom_cluster/test_kudu.py: PS5, Line 1: # Copyright (c) 2016 Cloudera, Inc. All rights reserved. Remove Cloudera copyright and make sure the license text is correct otherwise. http://gerrit.cloudera.org:8080/#/c/4414/5/tests/metadata/test_ddl.py File tests/metadata/test_ddl.py: PS5, Line 220: Why remove this skip? PS5, Line 221: Is this removal safe? PS5, Line 221: self.expected_exceptions = 2 I can't see where this is used; delete? http://gerrit.cloudera.org:8080/#/c/4414/5/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: PS5, Line 87: print("Describe formatted output:") : pprint(table_desc) Use LOG.info() -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Dimitris Tsirogiannis has uploaded a new patch set (#5). Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables With this commit we simplify the syntax and handling of CREATE TABLE statements for both managed and external Kudu tables. Syntax example: CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b)) DISTRIBUTE BY HASH (a) INTO 3 BUCKETS, RANGE (b) SPLIT ROWS (('abc', 'def')) STORED AS KUDU Changes: 1) Remove the requirement to specify table properties such as key columns in tblproperties. 2) Read table schema (column definitions, primary keys, and distribution schemes) from Kudu instead of the HMS. 3) For external tables, the Kudu table is now required to exist at the time of creation in Impala. 4) Disallow table properties that could conflict with an existing table. Ex: key_columns cannot be specified. 5) Add KUDU as a file format. 6) Add a startup flag to impalad to specify the default Kudu master addresses. The flag is used as the default value for the table property kudu_master_addresses but it can still be overriden using TBLPROPERTIES. 7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE wasn't implemented for Kudu tables and silently ignored. The Kudu tables wouldn't be removed in Kudu. 8) Remove DDL delegates. There was only one functional delegate (for Kudu) the existence of the other delegate and the use of delegates in general has led to confusion. The Kudu delegate only exists to provide functionality missing from Hive. 9) Add PRIMARY KEY at the column and table level. This syntax is fairly standard. When used at the column level, only one column can be marked as a key. When used at the table level, multiple columns can be used as a key. Only Kudu tables are allowed to use PRIMARY KEY. The old "kudu.key_columns" table property is no longer accepted though it is still used internally. "PRIMARY" is now a keyword. "KEY" is expected to be common enough that the ident style declaration is used instead to avoid conflicts. 10) For managed tables, infer a Kudu table name if none was given. The table property "kudu.table_name" is optional for managed tables and is required for external tables. If for a managed table a Kudu table name is not provided, a table name will be generated based on the HMS database and table name. If the database is "default", then a Kudu table name will not include the database. 11) Use Kudu master as the source of truth for table metadata instead of HMS when a table is loaded or refreshed. Table/column metadata are cached in the catalog and are stored in HMS in order to be able to use table and column statistics. Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 --- M be/src/service/frontend.cc M bin/start-catalogd.sh M bin/start-impala-cluster.py M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java A fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/Type.java D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java A fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.jav
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 4: (54 comments) http://gerrit.cloudera.org:8080/#/c/4414/4/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: Line 53: enum THdfsFileFormat { > rename This change would touch many places. Would you mind postponing it for a follow up patch? http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: Line 976: tbl_def_without_col_defs:tbl_def > a 'create table' without col defs? This was added for the EXTERNAL Kudu table use case for which no column definitions are specified since we load the schema from the Kudu table. Added a comment. Line 980: RESULT = new CreateTableStmt(tbl_def); > trailing Done Line 1033: // class doesn't inherit from CreateTableStmt. > should it? To my opinion yes it should. I actually went down the path of refactoring all the CREATE TABLE* statements but it ended up being too complex to add on top of this big patch. Simplifying the CREATE TABLE statements will also allow us to remove some of the weird table option handling we do in TableDef.java. I will leave a TODO for now. Line 1065: primary_keys_val ::= > opt_primary_keys? Done Line 1089: tbl_data_layout ::= > opt_...? Done Line 1139: {: > fix spaces and tabs Done Line 1370: KW_PRIMARY key_ident > what's wrong with KW_KEY? I don't think we can do that. Don't we use "key" for nested types (map)? http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java File fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java: Line 30: static void throwIfNotNullOrNotEmpty(Collection c, String message) > this is actually 'not null *and* not empty'. you can also phrase that as 'n Good point. Done http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java: Line 158: if (fileFormat_ == THdfsFileFormat.KUDU) { > check at top Done http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: Line 236: String.format("PRIMARY KEY must be used instead of the table property '%s'.", > not good: if you do that on an external table, you get this error message i I like that idea but it's a bit more complicated for Kudu because different properties are valid depending on whether it's an external or managed table. Moved the check to the function below. Let me know if that works ok. Line 310: distributeParam.setPKColumnDefMap(pkColumnDefsByName); > setPkColumn... Done Line 315: private boolean hasPrimaryKeysSpecified() { > hasPrimaryKeySpecified (there's only one, which can be a composite key) Done http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java: Line 121: org.apache.impala.catalog.Type colType = colDef.getType(); > does simply Type conflict with something? Yeah, there is a conflict with the enum Type in this class. Line 129: if (colType.isStringType() && !exprType.isStringType() > this is basically looking for 'assignment compatible', and i'm sure we alre Done Line 150: builder.append(numBuckets_).append(" BUCKETS"); > sprinkle some checkstates in here (on numbuckets and splitrows; or maybe a I added checks for numBuckets_. Split rows will go away in a follow up patch with the new range partitioning syntax. I left a TODO to add a validate function then. Line 200: literal.setString_literal(expr.getStringValue()); > checkstate that you're getting something valid Done Line 211: void setPKColumnDefMap(Map pkColumnDefByName) { > setPkC... Done http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: Line 74: static class TableDefOptions { > 'Options' is enough Done Line 160: fullyQualifiedTableName_ = analyzer.getFqTableName(getTblName()); > stick with fq abbreviation? Done Line 189: for (ColumnDef colDef: getPartitionColumnDefs()) { > this is a bit hard to follow. partition cols aren't defined separately, the These are the columns specified in a PARTITIONED BY clause (non-kudu) and they should be analyzed, no? Sorry, I am not sure I follow your comment. http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java File fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java: PS4, Line 97: org.apache > should this have changed? Hm sed is not that s
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Michael Brown has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/4414/1/tests/common/kudu_test_suite.py File tests/common/kudu_test_suite.py: PS1, Line 54: def auto_create_db(cls): : return True : : @classmethod : def get_db_name(cls): : # When py.test runs with the xdist plugin, several processes are started and each : # process runs some partition of the tests. It's possible that multiple processes : # will call this method. A random value is generated so the processes won't try : # to use the same database at the same time. The value is cached so within a single : # process the same database name is always used for the class. This doesn't need to : # be thread-safe since multi-threading is never used. : if not cls.__DB_NAME > Duno. I'll check with Michael. To answer this: It looks like this provides the same functionality as unique_database, but over a different channel (the conn/cursor fixtures in conftest.py). The design of KuduTestSuite and dependents are so different than most of our test cases. While switching to unique_database can be done, it will be nontrivial, and would require a bit of a change in test design. -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 4: (37 comments) http://gerrit.cloudera.org:8080/#/c/4414/4/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: Line 53: enum THdfsFileFormat { rename Line 63: // rename this enum to not be Hdfs specific. resolve http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: Line 976: tbl_def_without_col_defs:tbl_def a 'create table' without col defs? Line 980: RESULT = new CreateTableStmt(tbl_def); trailing Line 1033: // class doesn't inherit from CreateTableStmt. should it? Line 1065: primary_keys_val ::= opt_primary_keys? Line 1089: tbl_data_layout ::= opt_...? Line 1139: {: fix spaces and tabs Line 1370: KW_PRIMARY key_ident what's wrong with KW_KEY? http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java File fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java: Line 30: static void throwIfNotNullOrNotEmpty(Collection c, String message) this is actually 'not null *and* not empty'. you can also phrase that as 'not empty'. http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java: Line 158: if (fileFormat_ == THdfsFileFormat.KUDU) { check at top http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: Line 236: String.format("PRIMARY KEY must be used instead of the table property '%s'.", not good: if you do that on an external table, you get this error message instead of 'primary key not allowed'. move this check into the next function. alternatively, why not have a list of valid table properties per storage format, and then flag everything else as invalid? Line 310: distributeParam.setPKColumnDefMap(pkColumnDefsByName); setPkColumn... Line 315: private boolean hasPrimaryKeysSpecified() { hasPrimaryKeySpecified (there's only one, which can be a composite key) hasPrimaryKey would express the same when is it legal to call this (before/after analyze()?) http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java: Line 121: org.apache.impala.catalog.Type colType = colDef.getType(); does simply Type conflict with something? Line 129: if (colType.isStringType() && !exprType.isStringType() this is basically looking for 'assignment compatible', and i'm sure we already have code somewhere to express that more succinctly. Line 150: builder.append(numBuckets_).append(" BUCKETS"); sprinkle some checkstates in here (on numbuckets and splitrows; or maybe a validate() function that does those checks, toThrift() would also benefit from it) Line 200: literal.setString_literal(expr.getStringValue()); checkstate that you're getting something valid Line 211: void setPKColumnDefMap(Map pkColumnDefByName) { setPkC... http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: Line 74: static class TableDefOptions { 'Options' is enough Line 84: // File format of the table since this is clearly not a file format anymore, TStorageFormat? Line 160: fullyQualifiedTableName_ = analyzer.getFqTableName(getTblName()); stick with fq abbreviation? Line 189: for (ColumnDef colDef: getPartitionColumnDefs()) { this is a bit hard to follow. partition cols aren't defined separately, they're declared. so then why do you need to call colDef.analyze()? http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: Line 111: // Distribution schemes of this Kudu table. Both rang and hash-based distributions are range Line 140: return msTbl.getParameters().get(KuduTable.KEY_TABLE_NAME); i don't think this is worth a function call, it just makes the code harder to follow (extra level of indirection) PS4, Line 175: numClusteringCols_ = 0; > not really related to this change, but it's kind of confusing to have numCl those should be the primary key cols http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 1353: "functional.alltypestiny", "Columns cannot be specified with an external " + odd error message. i would expect the 'as select' to be
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 4: (19 comments) http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java File fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java: PS4, Line 97: org.apache should this have changed? http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: PS4, Line 89: com.cloudera can you add a ref to IMPALA-4271 ? PS4, Line 160: know known PS4, Line 159: /** :* The number of nodes is not know ahead of time and will be updated during computeStats :* in the scan node. :*/ : public int getNumNodes() { return -1; } I don't see this used PS4, Line 175: numClusteringCols_ = 0; not really related to this change, but it's kind of confusing to have numClusteringCols_ PS4, Line 226: List cols = msTable_.getSd().getCols(); : cols.clear(); why do we get cols from getCols() and then clear() it? PS4, Line 232: cols.add(new FieldSchema(colName, type.toSql().toLowerCase(), null)); why do we do this? cols isn't used later http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: PS4, Line 460: msTbl.getTableType().equals(TableType.EXTERNAL_TABLE.toString()); we shuold probably compare case insensitive to be safe http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: PS4, Line 1147: occurrs occurs PS4, Line 1482: } catch (Exception e) { : try { : // Error creating the table in HMS, drop the managed table from Kudu. : if (!Table.isExternalTable(newTable)) { : KuduCatalogOpExecutor.dropTable(newTable, false); : } : } catch (Exception logged) { : String kuduTableName = newTable.getParameters().get(KuduTable.KEY_TABLE_NAME); : LOG.error(String.format("Failed to drop Kudu table '%s'", kuduTableName), : logged); : throw new RuntimeException(String.format("Failed to create the table '%s' in " + : " the Metastore and the newly created Kudu table '%s' could not be " + : " dropped. The log contains more information.", newTable.getTableName(), : kuduTableName), e); : } : if (e instanceof AlreadyExistsException && params.if_not_exists) return false; : throw new ImpalaRuntimeException( : String.format(HMS_RPC_ERROR_FORMAT_STR, "createTable"), e); it looks like none of this really needs to be inside the synchronized block. Most of it doesn't matter but dropTable creates a KuduClient and makes an RPC. I could imagine this being a problem if Kudu is overwhelmed. You could catch the exception outside of the synchronized block, maybe some extra handling to know if it was created in HMS or not. Though from the looks of this we assume no exceptions from l1502 and 1503. http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: PS4, Line 232: if (!req.is_delta) { : catalog = new ImpaladCatalog(defaultKuduMasterAddrs_); : } 1line http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java: PS4, Line 135: if (!hasRangePartitioning) { : tableOpts.setRangePartitionColumns(Collections.emptyList()); : } I don't think this is necessary PS4, Line 175: erros errors PS4, Line 192: cols.clear(); can you indicate in the comment that this doesn't just populate msTbl's cols, it replaces whatever was there. PS4, Line 206: new KuduClient I'm not crazy about this wrapper class thing. It's only used in this file. PS4, Line 212: is accessible exists PS4, Line 215: validateTblProperties how about validateKuduTblExists ? PS4, Line 224: Error accessing table in Kudu " + : "master '%s' This could also print the name. Also to avoid confusing with potential future auth rules, can this say something like: "Kudu table TABLENAME on master MASTERADDRS does not exist." http://gerrit.cloudera.
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/4414/3/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: PS3, Line 350: // Parameters needed for hash distribution : struct TDistributeByHashParam { : 1: required list columns : 2: required i32 num_buckets : } : : struct TRangeLiteral { : 1: optional i64 int_literal : 2: optional string string_literal : } : : struct TRangeLiteralList { : 1: required list values : } : : // A range distribution is identified by a list of columns and a series of split rows. : struct TDistributeByRangeParam { : 1: required list columns : 2: optional list split_rows; : } : : // Parameters for the DISTRIBUTE BY clause. The actual distribution is identified by : // the type parameter. : struct TDistributeParam { : // Set if type is set to HASH : 1: optional TDistributeByHashParam by_hash_param; : : // Set if type is set to RANGE : 2: optional TDistributeByRangeParam by_range_param; : } > I don't see why this needs to be in the TKuduTable though. I think we just I consider the distribution scheme to be part of table metadata that are loaded in the catalog. Hence, they live in the KuduTable object and need to be serialized before they get sent to all the impalad nodes. Currently, the distribution schemes are only used for the SHOW CREATE TABLE statement. They should always be set in a TKuduTable. The part that is not set if the list of split values because they are not available from the Kudu API. -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/4414/3/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: PS3, Line 350: // Parameters needed for hash distribution : struct TDistributeByHashParam { : 1: required list columns : 2: required i32 num_buckets : } : : struct TRangeLiteral { : 1: optional i64 int_literal : 2: optional string string_literal : } : : struct TRangeLiteralList { : 1: required list values : } : : // A range distribution is identified by a list of columns and a series of split rows. : struct TDistributeByRangeParam { : 1: required list columns : 2: optional list split_rows; : } : : // Parameters for the DISTRIBUTE BY clause. The actual distribution is identified by : // the type parameter. : struct TDistributeParam { : // Set if type is set to HASH : 1: optional TDistributeByHashParam by_hash_param; : : // Set if type is set to RANGE : 2: optional TDistributeByRangeParam by_range_param; : } > We store the distribute by params in the KuduTable class which is then seri I don't see why this needs to be in the TKuduTable though. I think we just need this in the TCreateTableParams. I wouldn't mind except it's more work to reason about when it's set/not set in a TKuduTable. PS3, Line 390: : // Distribution schemes : 4: required list distribute_by > See my comment above. We do load and serialize the distribute by params in I don't see why this needs to be in TKuduTable, only TCreateTableParams. http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java: PS3, Line 156: if (splitRows_ == null) { : builder.append("..."); > This is possible because an object of DistributeParam class is created in t In case #2 why do we even bother creating a DistributeParams? The partitioning/distribution info only really makes sense at the time of table creation I think, right? This goes back to our conversation (in person) about how it'd probably make sense to separate the partitioning part of the command from CREATE TABLE, and do partitioning more like how we would for non-Kudu tables, i.e. separate commands. Obviously that hasn't been worked through and realistically we probably won't get to that in time. If we don't get to something like that, I think the only reason we'd want #2 (in terms of loading DistributeParams) is SHOW CREATE TABLE, so this class only needs to support toSql() in case #2. Is that right? Can you add a comment in the class header or the relevant constructor where we might get case #2 so it's easier to reason about? A TODO / JIRA ref about resolving the '...' would be good. It might be good to add a test case with this output if there isn't one already to make sure we fix it properly and/or it doesn't change in some other unexpected way. PS3, Line 191: if (splitRows_ == null) { : result.setBy_range_param(rangeParam); : return result; : } > See explanation above. I guess it's confusing to me because TDistributeParam doesn't seem to be necessary in case #2. In case #1 it gets added to a TCreateTableStmt object. In case #2 it gets added to a TKuduTable object, but it seems unnecessary given that no other backend or even the catalog care about the distribute params - right? If that's true, I think we could avoid storing this in KuduTable, then avoid calling this in KuduTable.getTKuduTable(), and then throwing here if splitRows_ is null since I don't think this should be called in this case. http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: PS3, Line 126: return fullyQualifiedTableName_ != null ? fullyQualifiedTableName_ : tableName_; > We do call this function before tableName_ has been analyzed. One example i ah ok, can you add a comment? http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java: PS3, Line 243: List primaryKeysSql, String kuduDistributeByParams, can you update the comment, especially around which of these should/can be null vs. empty.
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 3: (23 comments) http://gerrit.cloudera.org:8080/#/c/4414/3/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: PS3, Line 344: // Enum listing all possible DISTRIBUTE BY types : enum TDistributeType { : HASH, : RANGE, > I don't see this used Correct. Removed. PS3, Line 350: // Parameters needed for hash distribution : struct TDistributeByHashParam { : 1: required list columns : 2: required i32 num_buckets : } : : struct TRangeLiteral { : 1: optional i64 int_literal : 2: optional string string_literal : } : : struct TRangeLiteralList { : 1: required list values : } : : // A range distribution is identified by a list of columns and a series of split rows. : struct TDistributeByRangeParam { : 1: required list columns : 2: optional list split_rows; : } : : // Parameters for the DISTRIBUTE BY clause. The actual distribution is identified by : // the type parameter. : struct TDistributeParam { : // Set if type is set to HASH : 1: optional TDistributeByHashParam by_hash_param; : : // Set if type is set to RANGE : 2: optional TDistributeByRangeParam by_range_param; : } > I see that these are used in the serialized catalog objects, but given that We store the distribute by params in the KuduTable class which is then serialized and sent to al the impalads. So, I think we do need them in this case, unless I misunderstood your comment. PS3, Line 390: : // Distribution schemes : 4: required list distribute_by > same for this, can we get rid of it? See my comment above. We do load and serialize the distribute by params in the KuduTable class. http://gerrit.cloudera.org:8080/#/c/4414/3/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: PS3, Line 398: CatalogObjects.TDistributeParam > wrt my comments in CatalogObjects, I guess we'd need them here, but I don't See my responses to your comments and let me know if it's still not clear. http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/ColumnDefOptions.java File fe/src/main/java/org/apache/impala/analysis/ColumnDefOptions.java: PS3, Line 21: * Placeholder for the column definition options of a CREATE TABLE statement. : * Contains the list of column definitions and, optionally, the list of column names : * specified using the PRIMARY KEY keyword. > this feels like a weird abstraction... do we really need this class? This was used only during parsing to store the "(col type, col type, PRIMARY KEY (col, ..))" part of the create table stmt. I modified the parser so that this isn't needed anymore. Done http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: Line 87: InsertStmt getInsertStmt() { return insertStmt_; } > why change visibility? Done http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java: PS3, Line 344: getTblPrimaryKeyColumnNames(), null, > Why would there be PK col names? Done PS3, Line 344: > nit extra space Done http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: PS3, Line 167: if (getFileFormat() != THdfsFileFormat.KUDU && KuduTable.KUDU_STORAGE_HANDLER.equals( : getTblProperties().get(KuduTable.KEY_STORAGE_HANDLER))) { : throw new AnalysisException(KUDU_STORAGE_HANDLER_ERROR_MESSAGE); : } : : // Avro tables can have empty column defs because they can infer them from the Avro : // schema. Likewise for external Kudu tables, the schema can be read from Kudu. : if (getColumnDefs().isEmpty() && getFileFormat() != THdfsFileFormat.AVRO : && getFileFormat() != THdfsFileFormat.KUDU) { : throw new AnalysisException("Table requires at least 1 column"); : } : : if (getFileFormat() == THdfsFileFormat.KUDU) { : ana
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Dimitris Tsirogiannis has uploaded a new patch set (#4). Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables With this commit we simplify the syntax and handling of CREATE TABLE statements for both managed and external Kudu tables. Syntax example: CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b)) DISTRIBUTE BY HASH (a) INTO 3 BUCKETS, RANGE (b) SPLIT ROWS (('abc', 'def')) STORED AS KUDU Changes: 1) Remove the requirement to specify table properties such as key columns in tblproperties. 2) Read table schema (column definitions, primary keys, and distribution schemes) from Kudu instead of the HMS. 3) For external tables, the Kudu table is now required to exist at the time of creation in Impala. 4) Disallow table properties that could conflict with an existing table. Ex: key_columns cannot be specified. 5) Add KUDU as a file format. 6) Add a startup flag to impalad to specify the default Kudu master addresses. The flag is used as the default value for the table property kudu_master_addresses but it can still be overriden using TBLPROPERTIES. 7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE wasn't implemented for Kudu tables and silently ignored. The Kudu tables wouldn't be removed in Kudu. 8) Remove DDL delegates. There was only one functional delegate (for Kudu) the existence of the other delegate and the use of delegates in general has led to confusion. The Kudu delegate only exists to provide functionality missing from Hive. 9) Add PRIMARY KEY at the column and table level. This syntax is fairly standard. When used at the column level, only one column can be marked as a key. When used at the table level, multiple columns can be used as a key. Only Kudu tables are allowed to use PRIMARY KEY. The old "kudu.key_columns" table property is no longer accepted though it is still used internally. "PRIMARY" is now a keyword. "KEY" is expected to be common enough that the ident style declaration is used instead to avoid conflicts. 10) For managed tables, infer a Kudu table name if none was given. The table property "kudu.table_name" is optional for managed tables and is required for external tables. If for a managed table a Kudu table name is not provided, a table name will be generated based on the HMS database and table name. If the database is "default", then a Kudu table name will not include the database. 11) Use Kudu master as the source of truth for table metadata instead of HMS when a table is loaded or refreshed. Table/column metadata are cached in the catalog and are stored in HMS in order to be able to use table and column statistics. Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 --- M be/src/service/frontend.cc M bin/start-catalogd.sh M bin/start-impala-cluster.py M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java A fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java A fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java A fe/src/main/java/org/apache/impala/util/KuduClient.
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 3: (14 comments) next batch... I'm probably about 1/2 way through. http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java File fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java: PS1, Line 275: : : : : : : : > Actually, I don't find it such a terrible idea. The sooner we figure out in It's just more code to support and test... http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java File fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java: PS1, Line 138: : : > There is a restriction on the types of primary keys (no bool, float or doub Thanks, makes sense http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: Line 87: InsertStmt getInsertStmt() { return insertStmt_; } why change visibility? http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java: PS3, Line 344: nit extra space PS3, Line 344: getTblPrimaryKeyColumnNames(), null, Why would there be PK col names? Can you add a neg test case to AnalyzeDDLTest.java? AnalysisError("create table newkudutbl like kudu '/test-warehouse/schemas/alltypestiny.parquet'") http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: PS3, Line 167: if (getFileFormat() != THdfsFileFormat.KUDU && KuduTable.KUDU_STORAGE_HANDLER.equals( : getTblProperties().get(KuduTable.KEY_STORAGE_HANDLER))) { : throw new AnalysisException(KUDU_STORAGE_HANDLER_ERROR_MESSAGE); : } : : // Avro tables can have empty column defs because they can infer them from the Avro : // schema. Likewise for external Kudu tables, the schema can be read from Kudu. : if (getColumnDefs().isEmpty() && getFileFormat() != THdfsFileFormat.AVRO : && getFileFormat() != THdfsFileFormat.KUDU) { : throw new AnalysisException("Table requires at least 1 column"); : } : : if (getFileFormat() == THdfsFileFormat.KUDU) { : analyzeKuduFormat(analyzer); : } else { : AnalysisUtils.throwIfNotNullOrNotEmpty(getDistributeParams(), : "Only Kudu tables can use DISTRIBUTE BY clause."); : throwIfPrimaryKeysWereSpecified("Only Kudu tables can specify a PRIMARY KEY."); : } : : if (getFileFormat() == THdfsFileFormat.AVRO) { : setColumnDefs(analyzeAvroSchema(analyzer)); : if (getColumnDefs().isEmpty()) { : throw new AnalysisException( : "An Avro table requires column definitions or an Avro schema."); : } : AvroSchemaUtils.setFromSerdeComment(getColumnDefs()); : } : } what if we were to move everything Kudu related here into analyzeKuduFormat() which first checks if it is Kudu. If it is not, it can make sure it _doesn't_ have the distribute or PK params, then perform the storage handler check, then return. If it is a Kudu table it'd mostly be the same as it is. Basically we could just always call the fn and it handles Kudu or no-Kudu checks appropriately. Then we'd get rid of some of the Kudu mess here. PS3, Line 279: throw new AnalysisException(String.format("Column '%s' (type: '%s') cannot " + : "be used as a primary key. Keys must be integer or string types.", : pkColDef.getColName(), pkColDef.getType().toSql())); I don't see a test case for this error PS3, Line 277: ColumnDef pkColDef = getPrimaryKeyColumnDefs().get(i); : if (!KuduUtil.isSupportedKeyType(pkColDef.getType())) { : throw new AnalysisException(String.format("Column '%s' (type: '%s') cannot " + : "be used as a primary key. Keys must be integer or string types.", : pkColDef.getColName(), pkColDef.getType().toSql())); : } : if (!pkColDef.equals(getColumnDefs().get(i))) {
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 3: (11 comments) Not even close to done but want to start giving you feedback in pieces so we can parallelize a bit. http://gerrit.cloudera.org:8080/#/c/4414/3/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: PS3, Line 344: // Enum listing all possible DISTRIBUTE BY types : enum TDistributeType { : HASH, : RANGE, I don't see this used PS3, Line 350: // Parameters needed for hash distribution : struct TDistributeByHashParam { : 1: required list columns : 2: required i32 num_buckets : } : : struct TRangeLiteral { : 1: optional i64 int_literal : 2: optional string string_literal : } : : struct TRangeLiteralList { : 1: required list values : } : : // A range distribution is identified by a list of columns and a series of split rows. : struct TDistributeByRangeParam { : 1: required list columns : 2: optional list split_rows; : } : : // Parameters for the DISTRIBUTE BY clause. The actual distribution is identified by : // the type parameter. : struct TDistributeParam { : // Set if type is set to HASH : 1: optional TDistributeByHashParam by_hash_param; : : // Set if type is set to RANGE : 2: optional TDistributeByRangeParam by_range_param; : } I see that these are used in the serialized catalog objects, but given that none of this is stored in the metastore nor do we require it to be stored in the catalog, I don't see why we bother keeping all this in the thrift table object. Can we just remove it and not serialize this info? Hopefully we can even split the distribute/partition parameters out of CREATE TABLE eventually, anyway. PS3, Line 390: : // Distribution schemes : 4: required list distribute_by same for this, can we get rid of it? http://gerrit.cloudera.org:8080/#/c/4414/3/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: PS3, Line 398: CatalogObjects.TDistributeParam wrt my comments in CatalogObjects, I guess we'd need them here, but I don't see why we bother storing them in the serialized catalog table object. http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/ColumnDefOptions.java File fe/src/main/java/org/apache/impala/analysis/ColumnDefOptions.java: PS3, Line 21: * Placeholder for the column definition options of a CREATE TABLE statement. : * Contains the list of column definitions and, optionally, the list of column names : * specified using the PRIMARY KEY keyword. this feels like a weird abstraction... do we really need this class? http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java File fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java: PS3, Line 24: class TableDataLayout { : : private final List partitionColDefs_; : private final List distributeParams_; Not that we're doing this in this review, but we need to think about how this might change when we do the next steps for Kudu partitioning. PS3, Line 49: List getPartitionColumnDefs() { return partitionColDefs_; } : List getDistributeParams() { return distributeParams_; } any reason these aren't public? http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: PS3, Line 1: // Copyright 2016 Cloudera Inc. : // : // Licensed under the Apache License, Version 2.0 (the "License"); : // you may not use this file except in compliance with the License. : // You may obtain a copy of the License at : // : // http://www.apache.org/licenses/LICENSE-2.0 : // : // Unless required by applicable law or agreed to in writing, software : // distributed under the License is distributed on an "AS IS" BASIS, : // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. : // See the License for the specific language governing permissions and : // limitations under the License. this and maybe others need the new license header: // Licensed to the Apache Software Foundation (ASF) under one // or more contributor license agreements. See the NOTICE file // distributed
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Dimitris Tsirogiannis has uploaded a new patch set (#3). Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables With this commit we simplify the syntax and handling of CREATE TABLE statements for both managed and external Kudu tables. Syntax example: CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b)) DISTRIBUTE BY HASH (a) INTO 3 BUCKETS, RANGE (b) SPLIT ROWS (('abc', 'def')) STORED AS KUDU Changes: 1) Remove the requirement to specify table properties such as key columns in tblproperties. 2) Read table schema (column definitions, primary keys, and distribution schemes) from Kudu instead of the HMS. 3) For external tables, the Kudu table is now required to exist at the time of creation in Impala. 4) Disallow table properties that could conflict with an existing table. Ex: key_columns cannot be specified. 5) Add KUDU as a file format. 6) Add a startup flag to impalad to specify the default Kudu master addresses. The flag is used as the default value for the table property kudu_master_addresses but it can still be overriden using TBLPROPERTIES. 7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE wasn't implemented for Kudu tables and silently ignored. The Kudu tables wouldn't be removed in Kudu. 8) Remove DDL delegates. There was only one functional delegate (for Kudu) the existence of the other delegate and the use of delegates in general has led to confusion. The Kudu delegate only exists to provide functionality missing from Hive. 9) Add PRIMARY KEY at the column and table level. This syntax is fairly standard. When used at the column level, only one column can be marked as a key. When used at the table level, multiple columns can be used as a key. Only Kudu tables are allowed to use PRIMARY KEY. The old "kudu.key_columns" table property is no longer accepted though it is still used internally. "PRIMARY" is now a keyword. "KEY" is expected to be common enough that the ident style declaration is used instead to avoid conflicts. 10) For managed tables, infer a Kudu table name if none was given. The table property "kudu.table_name" is optional for managed tables and is required for external tables. If for a managed table a Kudu table name is not provided, a table name will be generated based on the HMS database and table name. If the database is "default", then a Kudu table name will not include the database. 11) Use Kudu master as the source of truth for table metadata instead of HMS when a table is loaded or refreshed. Table/column metadata are cached in the catalog and are stored in HMS in order to be able to use table and column statistics. Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 --- M be/src/service/frontend.cc M bin/start-catalogd.sh M bin/start-impala-cluster.py M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java A fe/src/main/java/org/apache/impala/analysis/ColumnDefOptions.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java A fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java D fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java A fe/src/main/java/org/apache/impala/service/KuduCatalogO
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 1: (94 comments) http://gerrit.cloudera.org:8080/#/c/4414/1//COMMIT_MSG Commit Message: PS1, Line 21: When attempting to create an external : table "foo" in database "bar", Impala will search for a Kudu table : name "foo.bar" and "bar" (Kudu doesn't have database name spaces : yet.) > I think this sentence is duplicated by #10. Maybe remove this part. This it Done PS1, Line 25: The Kudu table is now required to exist at the time of creation in : Impala. > for external tables? or if this is more general can you explain a bit more? Done PS1, Line 31: default value > can it still be overridden? Done PS1, Line 39: Eventually Hive should have the needed : functionality and the Kudu delegate (renamed in this patch to KuduCatalogOpExecutor) : can be removed. > maybe, it's not clear if impala would want to keep the ability to do this o Done PS1, Line 54: If the database is "default" then : the Kudu table name will not include the database. > hm... I wonder if it'd be easier just to keep it standardized and always ha Done PS1, Line 56: 11) Several improvements in the grammar related to the family : of CREATE TABLE statements. > I don't think this bullet adds much value to the commit message. Everything Done PS1, Line 58: 12) Added new tests and modified existing Kudu test to use the new : CREATE TABLE syntax. > I think this can be removed too, it should be implied there are tests and i Done PS1, Line 60: insteads > instead Done PS1, Line 60: source of truth > Does this mean we always hit Kudu for metadata? In the Catalog? In the impa Done PS1, Line 64: Not included in this commit: : - Additional column properties such as nullability and compression : encodings. > I don't think you need to include this, we have a separate JIRA for this. Done http://gerrit.cloudera.org:8080/#/c/4414/1/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: PS1, Line 47: {"", "(ZILjava/lang/String;IIZLjava/lang/String;)V", : &catalog_ctor_}, > 1 line Done http://gerrit.cloudera.org:8080/#/c/4414/1/be/src/service/frontend.cc File be/src/service/frontend.cc: PS1, Line 37: // XXX: This flag doesn't seem to be used anywhere. Maybe remove it? : DEFINE_bool(load_catalog_at_startup, false, "if true, load all catalog data at startup"); > agreed. doesn't look like it's used by JniFrontend. Let's remove it. Done PS1, Line 63: IPs > IP addresses. Done PS1, Line 63: > no space Done http://gerrit.cloudera.org:8080/#/c/4414/1/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: PS1, Line 52: THdfsFileFormat > Maybe not for this patch since it's already huge, but it'd be great to gene I agree we need to revisit this. Left a TODO for now. PS1, Line 61: THdfsCompression > similarly, this seems unnecessarily specific to Hdfs. Not necessarily somet Left TODO. http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: > FYI I'm not looking at this very carefully because it was reviewed previous Only few things changed mostly wrt specifying the PRIMARY KEY keyword inside the column definition list. PS1, Line 31: ColumnDefOptions > this class doesn't exist, please add the missing file Done PS1, Line 405: view_column_def_list, view_column_defs; > separate nonterminals if they dont fit? Done PS1, Line 458: key_ident; > does this not need Boolean like those below? Done http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java File fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java: > I'm not in love with the TableDef and TableDefOptions stuff because it feel As we spoke offline, the initial attempt to refactor this logic didn't succeed. I will not change these classes but I'll make an effort to separate the Kudu-specific logic at the function level. PS1, Line 240: getColumnDefs().add(KuduCatalogOpExecutor.REQUIRED_THRIFT_COLUMN_DEF); > is this needed? I thought there was going to be code that fetches the colum We do, but that happens in the catalog. This was added to satisfy the thrift spec. PS1, Line 242: if (!getTblProperties().containsKey(KuduTable.KEY_TABLE_NAME)) { : getTblProperties().put(KuduTable.KEY_TABLE_NAME, : KuduUtil.getDefaultCreateKuduTableName(getDb(), getTbl())); : } > hm... can we simplify this and only allow internal tables to be created wit This is executed only for managed tables. Is this what you meant? Also, I changed the prefix to be __IMPALA__db.table;
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Dimitris Tsirogiannis has uploaded a new patch set (#2). Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables With this commit we simplify the syntax and handling of CREATE TABLE statements for both managed and external Kudu tables. Syntax example: CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b)) DISTRIBUTE BY HASH (a) INTO 3 BUCKETS, RANGE (b) SPLIT ROWS (('abc', 'def')) STORED AS KUDU Changes: 1) Remove the requirement to specify table properties such as key columns in tblproperties. 2) Read table schema (column definitions, primary keys, and distribution schemes) from Kudu instead of the HMS. 3) For external tables, the Kudu table is now required to exist at the time of creation in Impala. 4) Disallow table properties that could conflict with an existing table. Ex: key_columns cannot be specified. 5) Add KUDU as a file format. 6) Add a startup flag to impalad to specify the default Kudu master addresses. The flag is used as the default value for the table property kudu_master_addresses but it can still be overriden using TBLPROPERTIES. 7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE wasn't implemented for Kudu tables and silently ignored. The Kudu tables wouldn't be removed in Kudu. 8) Remove DDL delegates. There was only one functional delegate (for Kudu) the existence of the other delegate and the use of delegates in general has led to confusion. The Kudu delegate only exists to provide functionality missing from Hive. 9) Add PRIMARY KEY at the column and table level. This syntax is fairly standard. When used at the column level, only one column can be marked as a key. When used at the table level, multiple columns can be used as a key. Only Kudu tables are allowed to use PRIMARY KEY. The old "kudu.key_columns" table property is no longer accepted though it is still used internally. "PRIMARY" is now a keyword. "KEY" is expected to be common enough that the ident style declaration is used instead to avoid conflicts. 10) For managed tables, infer a Kudu table name if none was given. The table property "kudu.table_name" is optional for managed tables and is required for external tables. If for a managed table a Kudu table name is not provided, a table name will be generated based on the HMS database and table name. If the database is "default", then a Kudu table name will not include the database. 11) Use Kudu master as the source of truth for table metadata instead of HMS when a table is loaded or refreshed. Table/column metadata are cached in the catalog and are stored in HMS in order to be able to use table and column statistics. Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 --- M be/src/service/frontend.cc M bin/start-catalogd.sh M bin/start-impala-cluster.py M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/com/cloudera/impala/analysis/AnalysisUtils.java M fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java A fe/src/main/java/com/cloudera/impala/analysis/ColumnDefOptions.java M fe/src/main/java/com/cloudera/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/com/cloudera/impala/analysis/CreateTableDataSrcStmt.java M fe/src/main/java/com/cloudera/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/com/cloudera/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java M fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java M fe/src/main/java/com/cloudera/impala/analysis/ModifyStmt.java A fe/src/main/java/com/cloudera/impala/analysis/TableDataLayout.java A fe/src/main/java/com/cloudera/impala/analysis/TableDef.java M fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java M fe/src/main/java/com/cloudera/impala/catalog/Catalog.java M fe/src/main/java/com/cloudera/impala/catalog/Db.java M fe/src/main/java/com/cloudera/impala/catalog/HdfsFileFormat.java M fe/src/main/java/com/cloudera/impala/catalog/ImpaladCatalog.java M fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java M fe/src/main/java/com/cloudera/impala/catalog/Table.java M fe/src/main/java/com/cloudera/impala/catalog/TableLoader.java D fe/src/main/java/com/cloudera/impala/catalog/delegates/DdlDelegate.java D fe/src/main/java/com/cloudera/impala/catalog/delegates/KuduDdlDelegate.java D fe/src/main/java/com/cloudera/impala/catalog/delegates/UnsupportedOpDelegate.java M fe/src/main/java/com/cloudera/impala/planner/HdfsPartitionFilter.java M fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java M fe/src/main/java/com/cloudera/impala/service/Frontend.java M fe/src/main/java/com/cloudera/impala/service/JniCatalog.java M fe/src/main/java/com/cloudera/impala/service/JniFrontend.java A
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 1: (3 comments) sorry, more comments I keep finding new things as I'm adding some new test cases myself. http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java File fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java: PS1, Line 87: : // Key to specify the number of tablet replicas. : // TODO(KUDU): Allow modification in alter table. : public static final String KEY_TABLET_REPLICAS = "kudu.num_tablet_replicas"; > I don't think we need to store this, even if we wanted to modify this later After looking a bit more, I see that this is the way the user specifies the # of replicas and there is no other way. I guess it's weird to strip out table properties added by the user, so I suppose we should keep it. Instead, it'd be worth us considering separating out options that need to be persisted (i.e. we need the mapping to kudu tbl name) vs things that just get passed through and we shouldn't get in the business of trying to replicate in our own metadata. Probably not P1 but worth a JIRA since this is user facing and will sit in metadata. Also: 1) the TODO(KUDU) that is here could be removed. 2) would be worth testing how this behaves with CREATE TABLE LIKE. I'd expect it to inherit this value but it'd be worth checking. http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/test/java/com/cloudera/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/com/cloudera/impala/analysis/AnalyzeDDLTest.java: PS1, Line 28: import com.cloudera.impala.catalog.KuduTable; : import junit.framework.Assert; nit: organize imports http://gerrit.cloudera.org:8080/#/c/4414/1/testdata/workloads/functional-query/queries/QueryTest/create_kudu.test File testdata/workloads/functional-query/queries/QueryTest/create_kudu.test: > we might still need to have explicit create table tests. Also, I don't see any tests for CTAS here or anywhere. Can you add a few positive (e.g. from kudu_functional.alltypes) and negative tests (e.g. involving datatypes that kudu doesn't support, e.g. functional.alltypes and functional.decimaltbl ). -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 1: (9 comments) ... a few more things http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java File fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java: PS1, Line 223: Preconditions.checkNotNull(kuduDistributeByParams); This breaks for tables that were created before this change. http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java File fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java: PS1, Line 87: : // Key to specify the number of tablet replicas. : // TODO(KUDU): Allow modification in alter table. : public static final String KEY_TABLET_REPLICAS = "kudu.num_tablet_replicas"; I don't think we need to store this, even if we wanted to modify this later. Further, it doesn't look like the API exposes a way to do that anyway. Can you add a TODO or remove this? PS1, Line 92: public static final String KEY_DISTRIBUTE_BY = "kudu.distribute_by"; I'm not sure we should be storing this. The Kudu client exposes KuduTable.getPartitionSchema(). It looks like it's just for use by SHOW CREATE TABLE, and it'd be good for us to use the PartitionSchema from them which we should probably be able to stringify. I'm OK with leaving that for later as long as you think it won't be an issue for dev environments to upgrade between state now -> this review as-is -> making the change i suggested http://gerrit.cloudera.org:8080/#/c/4414/1/testdata/workloads/functional-query/queries/QueryTest/create_kudu.test File testdata/workloads/functional-query/queries/QueryTest/create_kudu.test: we might still need to have explicit create table tests. Specifically we should test that you can create an internal table with a different kudu table name by overriding 'kudu.table_name'. We could basically modify the managed_kudu table test case that's here to do so. I think we need to find a way to modify the kudu schema (e.g. drop and add columns) and see if we handle gracefully. Seems OK to leave that as a future task, but we should at least know how we handle those cases. http://gerrit.cloudera.org:8080/#/c/4414/1/testdata/workloads/functional-query/queries/QueryTest/kudu-show-create.test File testdata/workloads/functional-query/queries/QueryTest/kudu-show-create.test: well, we should have some coverage of show create table... http://gerrit.cloudera.org:8080/#/c/4414/1/testdata/workloads/functional-query/queries/QueryTest/kudu_partition_ddl.test File testdata/workloads/functional-query/queries/QueryTest/kudu_partition_ddl.test: PS1, Line 15: : : : : : : : : : : I think this kind of error should still be covered. You can still mismatch PK cols and dist by columns. PS1, Line 37: : we should still cover this as well. PS1, Line 50: : same Some more questions about losing coverage. Fine if this is now covered by FE tests. -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/4414/1/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: PS1, Line 196: primay primary http://gerrit.cloudera.org:8080/#/c/4414/1/tests/common/__init__.py File tests/common/__init__.py: does this need to be its own file? http://gerrit.cloudera.org:8080/#/c/4414/1/tests/common/kudu_test_suite.py File tests/common/kudu_test_suite.py: PS1, Line 54: @classmethod : def get_db_name(cls): : # When py.test runs with the xdist plugin, several processes are started and each : # process runs some partition of the tests. It's possible that multiple processes : # will call this method. A random value is generated so the processes won't try : # to use the same database at the same time. The value is cached so within a single : # process the same database name is always used for the class. This doesn't need to : # be thread-safe since multi-threading is never used. : if not cls.__DB_NAME: : cls.__DB_NAME = \ : choice(ascii_lowercase) + "".join(sample(ascii_lowercase + digits, 5)) : return cls.__DB_NAME I've always disliked this function... I think we should try to use the db fixture that Michael Brown added. Maybe I'm missing something though. It'd be good to get Michael's input here. PS1, Line 127: mapping = {BOOL: "BOOLEAN", : DOUBLE: "DOUBLE", : FLOAT: "FLOAT", : INT16: "SMALLINT", : INT32: "INT", : INT64: "BIGINT", : INT8: "TINYINT", : STRING: "STRING"} this gets defined as a local every function invocation, right? how about creating the map outside the fn so its static. http://gerrit.cloudera.org:8080/#/c/4414/1/tests/custom_cluster/test_kudu.py File tests/custom_cluster/test_kudu.py: PS1, Line 25: TestRedaction ? PS1, Line 42: temp_kudu_table I'd think we could have the test fn take the database fixture and then pass it to temp_kudu_table -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 1: (31 comments) Ok, part 3. I have one more batch left... http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java File fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java: PS1, Line 222: : : : : : : was this moved somewhere else? Maybe I'll find it as I keep reading... PS1, Line 238: // Update the HMS : if (reuseMetadata) { : try { : client.alter_table(msTable_.getDbName(), msTable_.getTableName(), msTable_); : } catch (TException e) { : throw new TableLoadingException(e.getMessage()); : } : } Same deal, I'm not sure why reuseMetadata means update the HMS. http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java: PS1, Line 220: : private final static KuduCatalogOpExecutor kuduExecutor_ : = new KuduCatalogOpExecutor(); all static methods, why instantiate it? PS1, Line 1122: // The db == null case isn't handled. The only tables this should matter for are : // Kudu tables. The expectation is Impala will always know about any Kudu tables : // because Hive doesn't support Kudu yet. I've always had a problem with this comment :/ even after trimming it down I don't think it makes complete sense. This should mention briefly what this code _is handling_ before it talks about what it's not handling. Can you add 1 sentence at the beginning? PS1, Line 1124: When Hive supports Kudu the DDL delegates : // can be removed. https://issues.cloudera.org/browse/IMPALA-3424 tracks the removal. Can you remove these sentences and just leave IMPALA-3424 as a reference? We may not remove our kudu code even if/when Hive supports Kudu. PS1, Line 1127: , nit: extra comma PS1, Line 1150: // The operation will be aborted if the Kudu table cannot be dropped. If for : // some reason Kudu is permanently stuck in a non-functional state, the user is : // expected to ALTER TABLE to either set the table to UNMANAGED or set the format : // to something else. JIRA? Even if we don't fix it we should have something public to point to regarding this behavior. Also to make sure the docs team knows to document it. PS1, Line 1205: kuduExecutor_ static ref PS1, Line 1406: if (KuduTable.isKuduTable(tbl)) { : return createKuduTable(tbl, params.if_not_exists, params.getDistribute_by(), : response); : } : return createTable(tbl, params.if_not_exists, params.getCache_op(), response); : } it's too bad we have to just branch like this and handle everything differently, but I can't think of anything better. PS1, Line 1420: createMetaStoreTable I'd prefer to have this on the prev line and the parameter on the new line Line 1475:* creation of a managed Kudu table. comment on response param PS1, Line 1480: TDdlExecResponse response) I think this just fits on the prev line PS1, Line 1483: kuduExecutor_ static reference KuduCatalogOpExecutor, here elsewhere in this fn PS1, Line 1509: // Add the table to the catalog cache : Table newTbl = catalog_.addTable(newTable.getDbName(), newTable.getTableName()); : addTableToCatalogUpdate(newTbl, response.result); While I don't think these will throw, it might be worth wrapping all the logic after the Kudu create table in a try { } catch block that drops the kudu table. That'll future proof this a bit. Feel free to ignore though. http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/service/Frontend.java File fe/src/main/java/com/cloudera/impala/service/Frontend.java: PS1, Line 230: impaladCatalog_.getDefaultKuduMasterAddrs() this is fine but a little weird, normally I'm all about removing extra state but in this case it might be better to keep this as a parameter on frontend and pass it in. Unless there's somewhere else besides the old impaladCatalog we can get it? http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/service/JniCatalog.java File fe/src/main/java/com/cloudera/impala/service/JniCatalog.java: PS1, Line 83: : int otherLogLevel, boolean allowAuthToLocal, String kerberosPrincipal) wrapping http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/servi
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 1: (17 comments) another batch of comments, still a lot of files i haven't touched yet... http://gerrit.cloudera.org:8080/#/c/4414/1/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: PS1, Line 47: {"", "(ZILjava/lang/String;IIZLjava/lang/String;)V", : &catalog_ctor_}, 1 line http://gerrit.cloudera.org:8080/#/c/4414/1/be/src/service/frontend.cc File be/src/service/frontend.cc: PS1, Line 37: // XXX: This flag doesn't seem to be used anywhere. Maybe remove it? : DEFINE_bool(load_catalog_at_startup, false, "if true, load all catalog data at startup"); agreed. doesn't look like it's used by JniFrontend. Let's remove it. PS1, Line 63: IPs IP addresses. PS1, Line 63: no space http://gerrit.cloudera.org:8080/#/c/4414/1/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: PS1, Line 52: THdfsFileFormat Maybe not for this patch since it's already huge, but it'd be great to generalize this if we can. I can think of two improvements: 1) Maybe we should model the storage layer, e.g. have a TStorageEngine, then make this TFileFormat (perhaps). This is probably a big change. 2) Rename this to be TStorageFormat, which kind of addresses #1 but doesn't separate out storage engines and file formats. PS1, Line 61: THdfsCompression similarly, this seems unnecessarily specific to Hdfs. Not necessarily something to change now but maybe we can create a follow-up JIRA to clean this up. http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS1, Line 31: ColumnDefOptions this class doesn't exist, please add the missing file http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java File fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java: PS1, Line 84: BigDecimal why BigDecimal? Ultimately this has to resolve to some int for kudu's API. (We can check if it's 32 or 64bit). PS1, Line 110: <= 1) is 1 bucket actually not valid? PS1, Line 138: colType.isStringType() && !exprType.isStringType() : || colType.isIntegerType() && (!exprType.isIntegerType() : || exprType.getPrecision() > colType.getPrecision()) 1. I don't see anything in the Kudu client that explicitly says you can't partition on any particular types. This code will exclude boolean and floating pt types, which is maybe unnecessary. 2. Esp if we can address #1, is there a cleaner way to make sure the types are valid rather than enumerating the kinds of types to consider? I'm not sure, but maybe one of the frontend gurus can think of something. http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/catalog/HdfsFileFormat.java File fe/src/main/java/com/cloudera/impala/catalog/HdfsFileFormat.java: PS1, Line 39: HdfsFileFormat Can you open a JIRA (and leave it in the comment) to refactor this later? http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java File fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java: PS1, Line 79: : // TODO we should have something like KuduConfig.getDefaultConfig() do you know what this means / can it be removed now that we're adding default master addrs? Line 92: public static final String KEY_DISTRIBUTE_BY = "kudu.distribute_by"; comment Line 153:* Load the columns from the schema list can you add a comment about error handling in this function? PS1, Line 158: LOG.error(String.format("Kudu tables must have at least one" : + "key column (had %d), and no more key columns than there are table columns " : + "(had %d).", keyColumns.size(), schema.size())); shouldn't this still fail? if not, can you add a comment why this continues? PS1, Line 184: LOG.error(String.format("Some key columns were not found in" : + " the set of columns. List of column names: %s, List of key column names:" : + " %s", Iterables.toString(columnNames), Iterables.toString(keyColumns))); why do we continue? PS1, Line 199: // Get the table metadata from Kudu : if (reuseMetadata) { I'm confused about this. It's not clear to me from the name 'reuseMetadata' why this means we should populate the metadata from Kudu. If anything, it sounds like it would be the opposite. The base class comment just says "If 'reuseMetadata' is true, reuse valid existing metadata.". -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 1: (28 comments) Ok, here's a bit to get started. Just made my way through most of analysis. Still have a lot to get through but figured I'll give you the feedback I have while I do a few other things. http://gerrit.cloudera.org:8080/#/c/4414/1//COMMIT_MSG Commit Message: PS1, Line 21: When attempting to create an external : table "foo" in database "bar", Impala will search for a Kudu table : name "foo.bar" and "bar" (Kudu doesn't have database name spaces : yet.) I think this sentence is duplicated by #10. Maybe remove this part. This item is about reading the columns, right? The issue of table name resolution is separate IMO, and covered by #10, though consider moving them next together :) PS1, Line 25: The Kudu table is now required to exist at the time of creation in : Impala. for external tables? or if this is more general can you explain a bit more? PS1, Line 31: default value can it still be overridden? PS1, Line 39: Eventually Hive should have the needed : functionality and the Kudu delegate (renamed in this patch to KuduCatalogOpExecutor) : can be removed. maybe, it's not clear if impala would want to keep the ability to do this on its own... Remove? PS1, Line 54: If the database is "default" then : the Kudu table name will not include the database. hm... I wonder if it'd be easier just to keep it standardized and always have the name, i.e. using default in this case. What do you think? I'll see if I change my mind after reading more code. PS1, Line 56: 11) Several improvements in the grammar related to the family : of CREATE TABLE statements. I don't think this bullet adds much value to the commit message. Everything else is more user visible, this is just some necessary code cleanup. PS1, Line 58: 12) Added new tests and modified existing Kudu test to use the new : CREATE TABLE syntax. I think this can be removed too, it should be implied there are tests and it takes away from the billion other "features" in this patch :) PS1, Line 60: source of truth Does this mean we always hit Kudu for metadata? In the Catalog? In the impalad's Catalog? PS1, Line 60: insteads instead PS1, Line 64: Not included in this commit: : - Additional column properties such as nullability and compression : encodings. I don't think you need to include this, we have a separate JIRA for this. http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: FYI I'm not looking at this very carefully because it was reviewed previously https://gerrit.cloudera.org/#/c/2865/ Let me know if this has changed much since then. PS1, Line 405: view_column_def_list, view_column_defs; separate nonterminals if they dont fit? PS1, Line 458: key_ident; does this not need Boolean like those below? http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java File fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java: I'm not in love with the TableDef and TableDefOptions stuff because it feels like that's parser implementation concerns bleeding into analysis code. Do you think we need both? Can we fold the latter into TableDef? If we have to keep both, it'd be helpful to at least group the functions in this file based on their source (e.g. TableDef, TDOptions, or on this itself) with comments around the groups? Maybe within those groups they could also be sorted by visibility? There's a lot of stuff going on so I think it'd be helpful to keep it tidy. PS1, Line 240: getColumnDefs().add(KuduCatalogOpExecutor.REQUIRED_THRIFT_COLUMN_DEF); is this needed? I thought there was going to be code that fetches the columns from Kudu and stores them. PS1, Line 242: if (!getTblProperties().containsKey(KuduTable.KEY_TABLE_NAME)) { : getTblProperties().put(KuduTable.KEY_TABLE_NAME, : KuduUtil.getDefaultCreateKuduTableName(getDb(), getTbl())); : } hm... can we simplify this and only allow internal tables to be created with the name db_name.table_name? Obviously there could be conflicts and it should fail then. Then it'd be 1 less case to consider. Also, I wonder if we should prefix the underlying kudu table names with something, e.g. IMPALA_db.table? PS1, Line 275: // Kudu's data distribution rules are enforced here. A good reference is : // http://getkudu.io/docs/schema_design.html#data-distribution. As a summary: : // * Only primary key column can be used in distribution definitions :
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. Patch Set 1: There are definitely some rough edges but I am posting the CR so that we can iterate through it sooner than later. I'll keep refining it but all comments are welcomed. -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
Dimitris Tsirogiannis has uploaded a new change for review. http://gerrit.cloudera.org:8080/4414 Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables .. IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables With this commit we simplify the syntax and handling of CREATE TABLE statements for both managed and external Kudu tables. Syntax example: CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b)) DISTRIBUTE BY HASH (a) INTO 3 BUCKETS, RANGE (b) SPLIT ROWS (('abc', 'def')) STORED AS KUDU Changes: 1) Remove the requirement to specify table properties such as key columns in tblproperties. 2) Read table schema from Kudu. When attempting to create an external table "foo" in database "bar", Impala will search for a Kudu table name "foo.bar" and "bar" (Kudu doesn't have database name spaces yet.) 3) The Kudu table is now required to exist at the time of creation in Impala. 4) Disallow table properties that could conflict with an existing table. Ex: key_columns cannot be specified. 5) Add KUDU as a file format. 6) Add a startup flag to impalad to specify the default Kudu master addresses. The flag is used as the default value for the table property kudu_master_addresses. 7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE wasn't implemented for Kudu tables and silently ignored. The Kudu tables wouldn't be removed in Kudu. 8) Remove DDL delegates. There was only one functional delegate (for Kudu) the existence of the other delegate and the use of delegates in general has led to confusion. The Kudu delegate only exists to provide functionality missing from Hive. Eventually Hive should have the needed functionality and the Kudu delegate (renamed in this patch to KuduCatalogOpExecutor) can be removed. 9) Add PRIMARY KEY at the column and table level. This syntax is fairly standard. When used at the column level, only one column can be marked as a key. When used at the table level, multiple columns can be used as a key. Only Kudu tables are allowed to use PRIMARY KEY. The old "kudu.key_columns" table property is no longer accepted though it is still used internally. "PRIMARY" is now a keyword. "KEY" is expected to be common enough that the ident style declaration is used instead to avoid conflicts. 10) Infer a Kudu table name if none was given. The table property "kudu.table_name" is now optional. If not given, the Kudu table name will be created based on the Hive Metastore database and table name. "CREATE TABLE foo.bar (i INT PRIMARY KEY) STORED AS KUDU" will create a table in Kudu named "foo.bar". If the database is "default" then the Kudu table name will not include the database. 11) Several improvements in the grammar related to the family of CREATE TABLE statements. 12) Added new tests and modified existing Kudu test to use the new CREATE TABLE syntax. 13) Use Kudu master as the source of truth for table metadata insteads of HMS. Table/column metadata are still stored in HMS in order to be able to use table and column statistics. Not included in this commit: - Additional column properties such as nullability and compression encodings. Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 --- M be/src/catalog/catalog.cc M be/src/service/frontend.cc M bin/start-catalogd.sh M bin/start-impala-cluster.py M common/thrift/CatalogObjects.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/com/cloudera/impala/analysis/AnalysisUtils.java M fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java M fe/src/main/java/com/cloudera/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/com/cloudera/impala/analysis/CreateTableDataSrcStmt.java M fe/src/main/java/com/cloudera/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/com/cloudera/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java M fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java A fe/src/main/java/com/cloudera/impala/analysis/TableDataLayout.java A fe/src/main/java/com/cloudera/impala/analysis/TableDef.java A fe/src/main/java/com/cloudera/impala/analysis/TableDefOptions.java M fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java M fe/src/main/java/com/cloudera/impala/catalog/Catalog.java M fe/src/main/java/com/cloudera/impala/catalog/Db.java M fe/src/main/java/com/cloudera/impala/catalog/HdfsFileFormat.java M fe/src/main/java/com/cloudera/impala/catalog/ImpaladCatalog.java M fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java M fe/src/main/java/com/cloudera/impala/catalog/Table.java D fe/src/main/java/com/cloudera/impala/catalog/delegates/DdlDelegate.java D fe/src/main/java/com/cloudera/impala/catalog/delegates/KuduDdlDelegate.java D fe/src/main/java/com/cloudera/impala/catalog/delegates/UnsupportedOpDelegate.java M fe/src/main/java/com/cloudera/impala/plann