subject:"\[Impala\-ASF\-CR\] IMPALA\-3719\: Simplify CREATE TABLE statements with Kudu tables"

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-21 Thread Internal Jenkins (Code Review)

Internal Jenkins has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 13: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-21 Thread Internal Jenkins (Code Review)

Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

With this commit we simplify the syntax and handling of CREATE TABLE
statements for both managed and external Kudu tables.

Syntax example:
CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b))
DISTRIBUTE BY HASH (a) INTO 3 BUCKETS,
RANGE (b) SPLIT ROWS (('abc', 'def'))
STORED AS KUDU

Changes:
1) Remove the requirement to specify table properties such as key
   columns in tblproperties.
2) Read table schema (column definitions, primary keys, and distribution
   schemes) from Kudu instead of the HMS.
3) For external tables, the Kudu table is now required to exist at the
   time of creation in Impala.
4) Disallow table properties that could conflict with an existing
   table. Ex: key_columns cannot be specified.
5) Add KUDU as a file format.
6) Add a startup flag to impalad to specify the default Kudu master
   addresses. The flag is used as the default value for the table
   property kudu_master_addresses but it can still be overriden
   using TBLPROPERTIES.
7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE
   wasn't implemented for Kudu tables and silently ignored. The Kudu
   tables wouldn't be removed in Kudu.
8) Remove DDL delegates. There was only one functional delegate (for
   Kudu) the existence of the other delegate and the use of delegates in
   general has led to confusion. The Kudu delegate only exists to provide
   functionality missing from Hive.
9) Add PRIMARY KEY at the column and table level. This syntax is fairly
   standard. When used at the column level, only one column can be
   marked as a key. When used at the table level, multiple columns can
   be used as a key. Only Kudu tables are allowed to use PRIMARY KEY.
   The old "kudu.key_columns" table property is no longer accepted
   though it is still used internally. "PRIMARY" is now a keyword.
   The ident style declaration is used for "KEY" because it is also used
   for nested map types.
10) For managed tables, infer a Kudu table name if none was given.
   The table property "kudu.table_name" is optional for managed tables
   and is required for external tables. If for a managed table a Kudu
   table name is not provided, a table name will be generated based
   on the HMS database and table name.
11) Use Kudu master as the source of truth for table metadata instead
   of HMS when a table is loaded or refreshed. Table/column metadata
   are cached in the catalog and are stored in HMS in order to be
   able to use table and column statistics.

Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Reviewed-on: http://gerrit.cloudera.org:8080/4414
Reviewed-by: Alex Behm 
Tested-by: Internal Jenkins
---
M be/src/service/frontend.cc
M bin/start-catalogd.sh
M bin/start-impala-cluster.py
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
A fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java
D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java
D 
fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniCata

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-20 Thread Alex Behm (Code Review)

Alex Behm has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 13: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-20 Thread Dimitris Tsirogiannis (Code Review)

Hello Michael Brown, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4414

to look at the new patch set (#13).

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..

IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

With this commit we simplify the syntax and handling of CREATE TABLE
statements for both managed and external Kudu tables.

Syntax example:
CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b))
DISTRIBUTE BY HASH (a) INTO 3 BUCKETS,
RANGE (b) SPLIT ROWS (('abc', 'def'))
STORED AS KUDU

Changes:
1) Remove the requirement to specify table properties such as key
   columns in tblproperties.
2) Read table schema (column definitions, primary keys, and distribution
   schemes) from Kudu instead of the HMS.
3) For external tables, the Kudu table is now required to exist at the
   time of creation in Impala.
4) Disallow table properties that could conflict with an existing
   table. Ex: key_columns cannot be specified.
5) Add KUDU as a file format.
6) Add a startup flag to impalad to specify the default Kudu master
   addresses. The flag is used as the default value for the table
   property kudu_master_addresses but it can still be overriden
   using TBLPROPERTIES.
7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE
   wasn't implemented for Kudu tables and silently ignored. The Kudu
   tables wouldn't be removed in Kudu.
8) Remove DDL delegates. There was only one functional delegate (for
   Kudu) the existence of the other delegate and the use of delegates in
   general has led to confusion. The Kudu delegate only exists to provide
   functionality missing from Hive.
9) Add PRIMARY KEY at the column and table level. This syntax is fairly
   standard. When used at the column level, only one column can be
   marked as a key. When used at the table level, multiple columns can
   be used as a key. Only Kudu tables are allowed to use PRIMARY KEY.
   The old "kudu.key_columns" table property is no longer accepted
   though it is still used internally. "PRIMARY" is now a keyword.
   The ident style declaration is used for "KEY" because it is also used
   for nested map types.
10) For managed tables, infer a Kudu table name if none was given.
   The table property "kudu.table_name" is optional for managed tables
   and is required for external tables. If for a managed table a Kudu
   table name is not provided, a table name will be generated based
   on the HMS database and table name.
11) Use Kudu master as the source of truth for table metadata instead
   of HMS when a table is loaded or refreshed. Table/column metadata
   are cached in the catalog and are stored in HMS in order to be
   able to use table and column statistics.

Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
---
M be/src/service/frontend.cc
M bin/start-catalogd.sh
M bin/start-impala-cluster.py
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
A fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java
D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java
D 
fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniCatalog

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-20 Thread Dimitris Tsirogiannis (Code Review)

Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 12:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4414/12/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

Line 280: if (options_.fileFormat == THdfsFileFormat.KUDU) {
> This doesn't seem right. We used to be able to change the ROW FORMAT for TE
As we talked offline, I reverted this function to what used to be with the 
addition of an extra check for Kudu tables that don't support ROW FORMAT. Done


http://gerrit.cloudera.org:8080/#/c/4414/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 1209:   org.apache.hadoop.hive.metastore.api.Table msTbl = 
existingTbl.getMetaStoreTable();
> Should we only do this if existingTbl is not loaded?
As we talked earlier, this handles the most common cases. If an HMS table is 
dropped externally, then we have no way of deleting the underlying Kudu table 
as the needed information is stored in the tblproperties of msTbl.


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-20 Thread Alex Behm (Code Review)

Alex Behm has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 12:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4414/12/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

Line 280: if (options_.fileFormat == THdfsFileFormat.KUDU) {
This doesn't seem right. We used to be able to change the ROW FORMAT for TEXT 
and SEQUENCE. What was wrong with the previous code?


http://gerrit.cloudera.org:8080/#/c/4414/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 1209:   org.apache.hadoop.hive.metastore.api.Table msTbl = 
existingTbl.getMetaStoreTable();
Should we only do this if existingTbl is not loaded?

Some possible inconsistency issues that we should probably figure out sometime:

The HMS table may have already been dropped, and then we'll not drop the 
corresponding Kudu table.


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-19 Thread Dimitris Tsirogiannis (Code Review)

Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 12: Code-Review+1

Fixed some test loading issues and modified the dropTable() to properly handle 
missing Kudu tables. Running exhaustive tests.

-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-19 Thread Dimitris Tsirogiannis (Code Review)

Hello Michael Brown, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4414

to look at the new patch set (#12).

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..

IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

With this commit we simplify the syntax and handling of CREATE TABLE
statements for both managed and external Kudu tables.

Syntax example:
CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b))
DISTRIBUTE BY HASH (a) INTO 3 BUCKETS,
RANGE (b) SPLIT ROWS (('abc', 'def'))
STORED AS KUDU

Changes:
1) Remove the requirement to specify table properties such as key
   columns in tblproperties.
2) Read table schema (column definitions, primary keys, and distribution
   schemes) from Kudu instead of the HMS.
3) For external tables, the Kudu table is now required to exist at the
   time of creation in Impala.
4) Disallow table properties that could conflict with an existing
   table. Ex: key_columns cannot be specified.
5) Add KUDU as a file format.
6) Add a startup flag to impalad to specify the default Kudu master
   addresses. The flag is used as the default value for the table
   property kudu_master_addresses but it can still be overriden
   using TBLPROPERTIES.
7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE
   wasn't implemented for Kudu tables and silently ignored. The Kudu
   tables wouldn't be removed in Kudu.
8) Remove DDL delegates. There was only one functional delegate (for
   Kudu) the existence of the other delegate and the use of delegates in
   general has led to confusion. The Kudu delegate only exists to provide
   functionality missing from Hive.
9) Add PRIMARY KEY at the column and table level. This syntax is fairly
   standard. When used at the column level, only one column can be
   marked as a key. When used at the table level, multiple columns can
   be used as a key. Only Kudu tables are allowed to use PRIMARY KEY.
   The old "kudu.key_columns" table property is no longer accepted
   though it is still used internally. "PRIMARY" is now a keyword.
   The ident style declaration is used for "KEY" because it is also used
   for nested map types.
10) For managed tables, infer a Kudu table name if none was given.
   The table property "kudu.table_name" is optional for managed tables
   and is required for external tables. If for a managed table a Kudu
   table name is not provided, a table name will be generated based
   on the HMS database and table name.
11) Use Kudu master as the source of truth for table metadata instead
   of HMS when a table is loaded or refreshed. Table/column metadata
   are cached in the catalog and are stored in HMS in order to be
   able to use table and column statistics.

Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
---
M be/src/service/frontend.cc
M bin/start-catalogd.sh
M bin/start-impala-cluster.py
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
A fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java
D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java
D 
fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniCatalog

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-18 Thread Alex Behm (Code Review)

Alex Behm has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 11: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-18 Thread Dimitris Tsirogiannis (Code Review)

Hello Michael Brown,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4414

to look at the new patch set (#11).

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..

IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

With this commit we simplify the syntax and handling of CREATE TABLE
statements for both managed and external Kudu tables.

Syntax example:
CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b))
DISTRIBUTE BY HASH (a) INTO 3 BUCKETS,
RANGE (b) SPLIT ROWS (('abc', 'def'))
STORED AS KUDU

Changes:
1) Remove the requirement to specify table properties such as key
   columns in tblproperties.
2) Read table schema (column definitions, primary keys, and distribution
   schemes) from Kudu instead of the HMS.
3) For external tables, the Kudu table is now required to exist at the
   time of creation in Impala.
4) Disallow table properties that could conflict with an existing
   table. Ex: key_columns cannot be specified.
5) Add KUDU as a file format.
6) Add a startup flag to impalad to specify the default Kudu master
   addresses. The flag is used as the default value for the table
   property kudu_master_addresses but it can still be overriden
   using TBLPROPERTIES.
7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE
   wasn't implemented for Kudu tables and silently ignored. The Kudu
   tables wouldn't be removed in Kudu.
8) Remove DDL delegates. There was only one functional delegate (for
   Kudu) the existence of the other delegate and the use of delegates in
   general has led to confusion. The Kudu delegate only exists to provide
   functionality missing from Hive.
9) Add PRIMARY KEY at the column and table level. This syntax is fairly
   standard. When used at the column level, only one column can be
   marked as a key. When used at the table level, multiple columns can
   be used as a key. Only Kudu tables are allowed to use PRIMARY KEY.
   The old "kudu.key_columns" table property is no longer accepted
   though it is still used internally. "PRIMARY" is now a keyword.
   The ident style declaration is used for "KEY" because it is also used
   for nested map types.
10) For managed tables, infer a Kudu table name if none was given.
   The table property "kudu.table_name" is optional for managed tables
   and is required for external tables. If for a managed table a Kudu
   table name is not provided, a table name will be generated based
   on the HMS database and table name.
11) Use Kudu master as the source of truth for table metadata instead
   of HMS when a table is loaded or refreshed. Table/column metadata
   are cached in the catalog and are stored in HMS in order to be
   able to use table and column statistics.

Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
---
M be/src/service/frontend.cc
M bin/start-catalogd.sh
M bin/start-impala-cluster.py
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
A fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java
D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java
D 
fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-18 Thread Dimitris Tsirogiannis (Code Review)

Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4414/7/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 358:   1: required list values
> Why add a new special purpose structure that is redundant with TExpr? To me
If that's ok, I'll defer that for now. There are a bunch of stuff in KuduUtil 
related to split rows that make use of TRangeLiteral which will btw change in 
the range partitioning patch. So, I'll fix it there for good. Added a TODO.


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-18 Thread Dimitris Tsirogiannis (Code Review)

Hello Michael Brown,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4414

to look at the new patch set (#10).

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..

IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

With this commit we simplify the syntax and handling of CREATE TABLE
statements for both managed and external Kudu tables.

Syntax example:
CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b))
DISTRIBUTE BY HASH (a) INTO 3 BUCKETS,
RANGE (b) SPLIT ROWS (('abc', 'def'))
STORED AS KUDU

Changes:
1) Remove the requirement to specify table properties such as key
   columns in tblproperties.
2) Read table schema (column definitions, primary keys, and distribution
   schemes) from Kudu instead of the HMS.
3) For external tables, the Kudu table is now required to exist at the
   time of creation in Impala.
4) Disallow table properties that could conflict with an existing
   table. Ex: key_columns cannot be specified.
5) Add KUDU as a file format.
6) Add a startup flag to impalad to specify the default Kudu master
   addresses. The flag is used as the default value for the table
   property kudu_master_addresses but it can still be overriden
   using TBLPROPERTIES.
7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE
   wasn't implemented for Kudu tables and silently ignored. The Kudu
   tables wouldn't be removed in Kudu.
8) Remove DDL delegates. There was only one functional delegate (for
   Kudu) the existence of the other delegate and the use of delegates in
   general has led to confusion. The Kudu delegate only exists to provide
   functionality missing from Hive.
9) Add PRIMARY KEY at the column and table level. This syntax is fairly
   standard. When used at the column level, only one column can be
   marked as a key. When used at the table level, multiple columns can
   be used as a key. Only Kudu tables are allowed to use PRIMARY KEY.
   The old "kudu.key_columns" table property is no longer accepted
   though it is still used internally. "PRIMARY" is now a keyword.
   The ident style declaration is used for "KEY" because it is also used
   for nested map types.
10) For managed tables, infer a Kudu table name if none was given.
   The table property "kudu.table_name" is optional for managed tables
   and is required for external tables. If for a managed table a Kudu
   table name is not provided, a table name will be generated based
   on the HMS database and table name.
11) Use Kudu master as the source of truth for table metadata instead
   of HMS when a table is loaded or refreshed. Table/column metadata
   are cached in the catalog and are stored in HMS in order to be
   able to use table and column statistics.

Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
---
M be/src/service/frontend.cc
M bin/start-catalogd.sh
M bin/start-impala-cluster.py
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
A fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java
D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java
D 
fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-18 Thread Dimitris Tsirogiannis (Code Review)

Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 9: Code-Review+1

(9 comments)

Carry mikeb's +1 for python tests.

http://gerrit.cloudera.org:8080/#/c/4414/7/be/src/service/frontend.cc
File be/src/service/frontend.cc:

Line 62: "value should be a comma separated list of hostnames or IP 
addresses.");
> Add that detail to the description.
Done


http://gerrit.cloudera.org:8080/#/c/4414/9/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

Line 1047: // Used for creating tables where the schema is inferred externally, 
e.g., from an Avro 
> whitespace
Done


http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java:

Line 105: for (String colName: colNames_) {
> I was wondering whether we should allow something like DISTRIBUTED BY(a, a,
Actually this case is rejected by Kudu. Added a test in kudu_create.test.


http://gerrit.cloudera.org:8080/#/c/4414/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 1350: // CTAS in an external Kudu table
> I was thinking it would mean the same thing it does for HDFS tables. After 
I see. Let's defer this to a follow up patch if you don't mind. Besides there 
are other CREATE TABLE statements that we need to revisit for Kudu such as 
CREATE TABLE LIKE. As is right now, this change would require non-substantial 
changes to the way we analyze and execute this statement. Filed a JIRA and 
added a TODO.


http://gerrit.cloudera.org:8080/#/c/4414/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 1719: // bucket is partitioned into 4 tables based on the split points 
of 'y'.
> tables == tablets?
:) yes. Done


http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

Line 203:   if row_format and file_format == 'text':
> There are a few examples here:
That's fine. I changed it to be on the safe side.


http://gerrit.cloudera.org:8080/#/c/4414/9/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test:

Line 82: create table `add`(`analytic` int primary key) distribute by hash 
(`analytic`)
> also cover the range() clause in this same test
Done


http://gerrit.cloudera.org:8080/#/c/4414/7/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 204: """
> File JIRA and defer.
Done KUDU-1711


Line 213: )
> Thanks for the explanation, seems fine.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-18 Thread Michael Brown (Code Review)

Michael Brown has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 9: Code-Review+1

for the python tests

-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-18 Thread Alex Behm (Code Review)

Alex Behm has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 9:

(3 comments)

I'll wait for the next change set and then do a final round over everything.

http://gerrit.cloudera.org:8080/#/c/4414/9/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

Line 1047: // Used for creating tables where the schema is inferred externally, 
e.g., from an Avro 
whitespace


http://gerrit.cloudera.org:8080/#/c/4414/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 1719: // bucket is partitioned into 4 tables based on the split points 
of 'y'.
tables == tablets?


http://gerrit.cloudera.org:8080/#/c/4414/9/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test:

Line 82: create table `add`(`analytic` int primary key) distribute by hash 
(`analytic`)
also cover the range() clause in this same test


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-18 Thread Alex Behm (Code Review)

Alex Behm has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 7:

(10 comments)

Responses to comments. Doing another round now.

http://gerrit.cloudera.org:8080/#/c/4414/7/be/src/service/frontend.cc
File be/src/service/frontend.cc:

Line 62: "value should be a comma separated list of hostnames or IP 
addresses.");
> Ports are optional.
Add that detail to the description.


http://gerrit.cloudera.org:8080/#/c/4414/7/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 358:   1: required list values
> Hm, TExpr? That sounds a bit too much for simple literal values unless we e
Why add a new special purpose structure that is redundant with TExpr? To me a 
TExpr is pretty clear, but without looking at the definition, I don't really 
know what a TRangeLiteral is.


http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java:

Line 105: for (String colName: colNames_) {
> Not sure I follow. Are we doing something to prevent this?
I was wondering whether we should allow something like DISTRIBUTED BY(a, a, a, 
a, a) or whether we should reject that because 'a' is mentioned multiple times. 
We reject such cases for PKs.


http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java:

Line 231: } catch (Exception e) {
> I don't think we can differentiate these two given the way the Kudu client 
Got it, thanks!


http://gerrit.cloudera.org:8080/#/c/4414/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 1350: // CTAS in an external Kudu table
> Hm, what does it really mean to allow CTAS on external Kudu table? We have 
I was thinking it would mean the same thing it does for HDFS tables. After the 
CTAS we have a new external table with a schema according to the query 
statement. The fact that it's a Kudu and not an HDFS table seems like an 
insignificant detail from a user's point of view. In other words, at a 
high-level it's not very clear to me why Kudu would behave any different than 
HDFS.

I get that the implementation may be weird. Let's revisit in a follow-on 
change, if necessary.


Line 1739: AnalysisError(String.format("create table tab (x int) distribute 
by hash (x) " +
> Well these were never specified in TBLPROPERTIES so I am not sure if we sho
I see. Most of the metadata is in Kudu now. Nice!


http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

Line 203:   if row_format and file_format == 'text':
> Hm, I couldn't find anything in the docs or in our tests.
There are a few examples here:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL


No idea whether we'll actually do the right thing


http://gerrit.cloudera.org:8080/#/c/4414/7/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 204: CREATE TABLE {table} (c INT PRIMARY KEY)
> That's interesting. Actually, currently don't work because we can't persist
File JIRA and defer.


Line 213: TBLPROPERTIES 
('kudu.master_addresses'='{kudu_addr}')""".format(
> Good question. If the user doesn't specify the kudu master_addresses we pop
Thanks for the explanation, seems fine.


Line 226: DISTRIBUTE BY HASH (c) INTO 3 BUCKETS, RANGE (c) SPLIT ROWS 
(...)
> No, but Kudu's API doesn't currently return any information about range par
Got it, thanks.


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-18 Thread Dimitris Tsirogiannis (Code Review)

Dimitris Tsirogiannis has uploaded a new patch set (#9).

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..

IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

With this commit we simplify the syntax and handling of CREATE TABLE
statements for both managed and external Kudu tables.

Syntax example:
CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b))
DISTRIBUTE BY HASH (a) INTO 3 BUCKETS,
RANGE (b) SPLIT ROWS (('abc', 'def'))
STORED AS KUDU

Changes:
1) Remove the requirement to specify table properties such as key
   columns in tblproperties.
2) Read table schema (column definitions, primary keys, and distribution
   schemes) from Kudu instead of the HMS.
3) For external tables, the Kudu table is now required to exist at the
   time of creation in Impala.
4) Disallow table properties that could conflict with an existing
   table. Ex: key_columns cannot be specified.
5) Add KUDU as a file format.
6) Add a startup flag to impalad to specify the default Kudu master
   addresses. The flag is used as the default value for the table
   property kudu_master_addresses but it can still be overriden
   using TBLPROPERTIES.
7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE
   wasn't implemented for Kudu tables and silently ignored. The Kudu
   tables wouldn't be removed in Kudu.
8) Remove DDL delegates. There was only one functional delegate (for
   Kudu) the existence of the other delegate and the use of delegates in
   general has led to confusion. The Kudu delegate only exists to provide
   functionality missing from Hive.
9) Add PRIMARY KEY at the column and table level. This syntax is fairly
   standard. When used at the column level, only one column can be
   marked as a key. When used at the table level, multiple columns can
   be used as a key. Only Kudu tables are allowed to use PRIMARY KEY.
   The old "kudu.key_columns" table property is no longer accepted
   though it is still used internally. "PRIMARY" is now a keyword.
   The ident style declaration is used for "KEY" because it is also used
   for nested map types.
10) For managed tables, infer a Kudu table name if none was given.
   The table property "kudu.table_name" is optional for managed tables
   and is required for external tables. If for a managed table a Kudu
   table name is not provided, a table name will be generated based
   on the HMS database and table name.
11) Use Kudu master as the source of truth for table metadata instead
   of HMS when a table is loaded or refreshed. Table/column metadata
   are cached in the catalog and are stored in HMS in order to be
   able to use table and column statistics.

Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
---
M be/src/service/frontend.cc
M bin/start-catalogd.sh
M bin/start-impala-cluster.py
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
A fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java
D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java
D 
fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A fe/src/main/java/org/apache/impala/

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-18 Thread Dimitris Tsirogiannis (Code Review)

Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..

Patch Set 5:

(4 comments)

A few minor clarifications.

http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

Line 195: msClient.alter_table(msTable_.getDbName(),
msTable_.getTableName(), msTable_);
> Right, most of the time there will be no changes. I'm not only worried abou
In theory, it should be ok given that there is a single write path to HMS for
Kudu table schema. If that changes, I agree we may have to deal with some weird
inconsistent states.

Line 216: String colName = ToSqlUtils.getIdentSql(colSchema.getName());
> We don't parse the column names in the HMS backend table and neither does H
Yeah, you're right. Fixed it in a follow up patch.

http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 1151: // the metadata must be fetched from the Hive Metastore.
> Makes sense now that you've explained, please leave a brief comment.
Done

http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java:

Line 85: kudu.createTable(kuduTableName, schema, tableOpts);
> Agree. Can you file a JIRA against Kudu? Thanks!
Done

--
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis
Gerrit-Reviewer: Alex Behm
Gerrit-Reviewer: Dimitris Tsirogiannis
Gerrit-Reviewer: Marcel Kornacker
Gerrit-Reviewer: Matthew Jacobs
Gerrit-Reviewer: Michael Brown
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-18 Thread Dimitris Tsirogiannis (Code Review)

Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 7:

(63 comments)

Still a few minor asks to go through (e.g. add JIRA numbers and few 
clarifications) but I am sending the next patch.

http://gerrit.cloudera.org:8080/#/c/4414/7/be/src/service/frontend.cc
File be/src/service/frontend.cc:

Line 62: "value should be a comma separated list of hostnames or IP 
addresses.");
> are ports optional or mandatory?
Ports are optional.


http://gerrit.cloudera.org:8080/#/c/4414/7/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 358:   1: required list values
> Why not a list of TExpr that are expected to be literals? Seems more future
Hm, TExpr? That sounds a bit too much for simple literal values unless we 
envision using exprs, udfs, etc. Thoughts?


Line 368: // the type parameter.
> which type parameter?
Sorry, there was an enum that got removed and I forgot to update the comments. 
Done


http://gerrit.cloudera.org:8080/#/c/4414/7/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

Line 398:   14: optional list distribute_by;
> for consistency let's remove trailing ";"
Done


http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

Line 1033: // class doesn't inherit from CreateTableStmt. 
> whitespace
Done


Line 1047: // Used for creating external Kudu tables for which the schema is 
loaded from Kudu.
> There seem to be more uses of this production, so this comment could be mis
Done


Line 1112: // or one RANGE clause
> typo: clauses
Done


http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
File fe/src/main/java/org/apache/impala/analysis/ColumnDef.java:

Line 93:   void setIsPrimaryKey() { isPrimaryKey_ = true; }
> do we need this?
No, removed.


Line 191:   
Preconditions.checkState(!colDefsByColName.containsKey(colDef.getColName()));
> can check return value of put()
Good point. Keep forgetting this. Done


http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

Line 208:* Analyzes and checks table properties which are common for both 
managed and external
> typo: common to
Done


Line 255: "PARTITIONED BY cannot be used with an Kudu table.");
> typo: a Kudu table
Done


Line 273: AnalysisUtils.throwIfNullOrEmpty(getPrimaryKeyColumnDefs(),
> Shouldn't this check hasPrimaryKey()?
Done


Line 284: "zero. Given number of replicas is: " + r.toString() + 
".'");
> remove trailing .' or add the opening single-quote
Done


Line 318:   private boolean hasPrimaryKey() {
> Isn't it enough to check primaryKeyColDefs_ in tableDef_?
You're right, we can do that since tableDef_ is analyzed.


http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java:

Line 105: for (String colName: colNames_) {
> we could specify the same distribution column multiple times
Not sure I follow. Are we doing something to prevent this?


Line 127: throw new AnalysisException("Split values cannot be 
NULL");
> do we have a test for this?
Hm no, added one. Thanks


Line 223: colNames_.addAll(colNames);
> do we need toLower()?
We've already lowered the column names before the call to setColumnNames().


http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

Line 69:   // Populated during analysis.
> Authoritative list of primary key column definitions populated during analy
Done


Line 177: fqTableName_.analyze();
> Do you know if Kudu has more permissive or more restrictive constraints on 
Agreed. Will do.


Line 181: if (analyzer.dbContainsTable(getTblName().getDb(), getTbl(), 
Privilege.CREATE)
> Are we going to check Sentry privs for Kudu tables? Also ok to defer this f
It's still being discussed :). Lineage as well.


Line 220:* Analyzes the primary key columns. Primary keys are only 
supported for Kudu
> Replace the second sentence with a brief description what this checks. It d
Done


Line 234:   StringBuilder columnDefStr = new StringBuilder();
> Not used?
Done


http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 1117: if (db != null && params.cascade) dropTablesFromKudu(db);
> I think it might be a good idea to do this under the metastoreDdlLock_ as w
Done


http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/service/KuduCatalogO

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-18 Thread Dimitris Tsirogiannis (Code Review)

Dimitris Tsirogiannis has uploaded a new patch set (#8).

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..

IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

With this commit we simplify the syntax and handling of CREATE TABLE
statements for both managed and external Kudu tables.

Syntax example:
CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b))
DISTRIBUTE BY HASH (a) INTO 3 BUCKETS,
RANGE (b) SPLIT ROWS (('abc', 'def'))
STORED AS KUDU

Changes:
1) Remove the requirement to specify table properties such as key
   columns in tblproperties.
2) Read table schema (column definitions, primary keys, and distribution
   schemes) from Kudu instead of the HMS.
3) For external tables, the Kudu table is now required to exist at the
   time of creation in Impala.
4) Disallow table properties that could conflict with an existing
   table. Ex: key_columns cannot be specified.
5) Add KUDU as a file format.
6) Add a startup flag to impalad to specify the default Kudu master
   addresses. The flag is used as the default value for the table
   property kudu_master_addresses but it can still be overriden
   using TBLPROPERTIES.
7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE
   wasn't implemented for Kudu tables and silently ignored. The Kudu
   tables wouldn't be removed in Kudu.
8) Remove DDL delegates. There was only one functional delegate (for
   Kudu) the existence of the other delegate and the use of delegates in
   general has led to confusion. The Kudu delegate only exists to provide
   functionality missing from Hive.
9) Add PRIMARY KEY at the column and table level. This syntax is fairly
   standard. When used at the column level, only one column can be
   marked as a key. When used at the table level, multiple columns can
   be used as a key. Only Kudu tables are allowed to use PRIMARY KEY.
   The old "kudu.key_columns" table property is no longer accepted
   though it is still used internally. "PRIMARY" is now a keyword.
   The ident style declaration is used for "KEY" because it is also used
   for nested map types.
10) For managed tables, infer a Kudu table name if none was given.
   The table property "kudu.table_name" is optional for managed tables
   and is required for external tables. If for a managed table a Kudu
   table name is not provided, a table name will be generated based
   on the HMS database and table name.
11) Use Kudu master as the source of truth for table metadata instead
   of HMS when a table is loaded or refreshed. Table/column metadata
   are cached in the catalog and are stored in HMS in order to be
   able to use table and column statistics.

Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
---
M be/src/service/frontend.cc
M bin/start-catalogd.sh
M bin/start-impala-cluster.py
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
A fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java
D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java
D 
fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A fe/src/main/java/org/apache/impala/

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-17 Thread Alex Behm (Code Review)

Alex Behm has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 7:

(26 comments)

Next round over the code.

http://gerrit.cloudera.org:8080/#/c/4414/7/be/src/service/frontend.cc
File be/src/service/frontend.cc:

Line 62: "value should be a comma separated list of hostnames or IP 
addresses.");
are ports optional or mandatory?


http://gerrit.cloudera.org:8080/#/c/4414/7/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 358:   1: required list values
Why not a list of TExpr that are expected to be literals? Seems more future 
proof.


Line 368: // the type parameter.
which type parameter?


http://gerrit.cloudera.org:8080/#/c/4414/7/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

Line 398:   14: optional list distribute_by;
for consistency let's remove trailing ";"


http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

Line 1033: // class doesn't inherit from CreateTableStmt. 
whitespace


Line 1047: // Used for creating external Kudu tables for which the schema is 
loaded from Kudu.
There seem to be more uses of this production, so this comment could be 
misleading. Maybe generalize to something like
"Used for creating tables where the schema is inferred externally, e.g., from 
an Avro schema, Kudu table or query statement."


Line 1112: // or one RANGE clause
typo: clauses


http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
File fe/src/main/java/org/apache/impala/analysis/ColumnDef.java:

Line 93:   void setIsPrimaryKey() { isPrimaryKey_ = true; }
do we need this?


Line 191:   
Preconditions.checkState(!colDefsByColName.containsKey(colDef.getColName()));
can check return value of put()


http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

Line 208:* Analyzes and checks table properties which are common for both 
managed and external
typo: common to


Line 255: "PARTITIONED BY cannot be used with an Kudu table.");
typo: a Kudu table

this also needs to be checked for managed tables right?


Line 273: AnalysisUtils.throwIfNullOrEmpty(getPrimaryKeyColumnDefs(),
Shouldn't this check hasPrimaryKey()?


Line 284: "zero. Given number of replicas is: " + r.toString() + 
".'");
remove trailing .' or add the opening single-quote


Line 318:   private boolean hasPrimaryKey() {
Isn't it enough to check primaryKeyColDefs_ in tableDef_?


http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java:

Line 105: for (String colName: colNames_) {
we could specify the same distribution column multiple times


Line 127: throw new AnalysisException("Split values cannot be 
NULL");
do we have a test for this?


Line 223: colNames_.addAll(colNames);
do we need toLower()?


http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

Line 69:   // Populated during analysis.
Authoritative list of primary key column definitions populated during analysis.


Line 177: fqTableName_.analyze();
Do you know if Kudu has more permissive or more restrictive constraints on what 
strings can be used as table/column names? I'd be surprised if HMS and Kudu 
were identical in that respect. Better to file a JIRA and leave that 
investigation/fix out of this patch.


Line 181: if (analyzer.dbContainsTable(getTblName().getDb(), getTbl(), 
Privilege.CREATE)
Are we going to check Sentry privs for Kudu tables? Also ok to defer this fix, 
but let's not forget.


Line 220:* Analyzes the primary key columns. Primary keys are only 
supported for Kudu
Replace the second sentence with a brief description what this checks. It does 
not check the format and succeeds if no primary keys are given, so there is 
nothing Kudu specific here (that is checked in CreateTableStmt).


Line 234:   StringBuilder columnDefStr = new StringBuilder();
Not used?


http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 1117: if (db != null && params.cascade) dropTablesFromKudu(db);
I think it might be a good idea to do this under the metastoreDdlLock_ as well 
to ensure that the Kudu and HMS table deletions are atomic.


http://gerrit.cloudera.org:8080/#/c/4414/7/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java:

Line 47: /*

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-17 Thread Alex Behm (Code Review)

Alex Behm has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 5:

(8 comments)

Responses to comments. Starting next round on code.

http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

Line 195:   msClient.alter_table(msTable_.getDbName(), 
msTable_.getTableName(), msTable_);
> Good point. We can potentially avoid this by checking if there has been som
Right, most of the time there will be no changes. I'm not only worried about 
the perf, but all the extra weird states we can be in when doing this extra RPC 
to modify the HMS.


Line 216:   String colName = ToSqlUtils.getIdentSql(colSchema.getName());
> Hm, can you explain why this is the case given that we will use this name t
We don't parse the column names in the HMS backend table and neither does Hive. 
The backticks are only needed to tell our parser to interpret a token as an 
identifier, but the parser is never invoked for column names from the HMS.

I recommend trying this out with some Impala keywords to see what happens.


Line 223: numClusteringCols_ = primaryKeyColumnNames_.size();
> Actually, I removed this and left a TODO. There are a few places that inter
Yea, there's definitely some inconsistency here.


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 1151: // the metadata must be fetched from the Hive Metastore.
> In order to drop a table from Kudu we need the Kudu table name which is sto
Makes sense now that you've explained, please leave a brief comment.


Line 1172:   if (!KuduTable.isKuduTable(msTable)) continue;
> Well, it's an easy check we can do to avoid RPC calls to Kudu for dropping 
Got it, thanks!


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java:

Line 85:   kudu.createTable(kuduTableName, schema, tableOpts);
> I see what you're saying but I don't think this will work. We need to be ab
Agree. Can you file a JIRA against Kudu? Thanks!


Line 178:   if (kudu.tableExists(tableName)) {
> I get it. See my comment above and let me know what you think.
I agree with you.


Line 214: String colName = ToSqlUtils.getIdentSql(colSchema.getName());
> I'd appreciate it if you could explain why is it wrong to use this here and
Explained in other comment, let me know if still unclear :)


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-17 Thread Alex Behm (Code Review)

Alex Behm has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 7:

(29 comments)

http://gerrit.cloudera.org:8080/#/c/4414/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 1350: // CTAS in an external Kudu table
What's the rationale for not allowing this?


Line 1716: AnalyzesOk("create table tab (x int, y string, primary key(x, 
y)) " +
Can you add a brief comment explaining the table layout for this one?


Line 1717: "distribute by hash(x) into 3 buckets, range(y) split rows " 
+
can we distribute by range, hash?


Line 1723: AnalyzesOk("create table tab (a int, b int, c int, d int, 
primary key (a, b, c))" +
What happens if I specify PARTITION BY for a Kudu table?


Line 1739: AnalysisError(String.format("create table tab (x int) distribute 
by hash (x) " +
What about specifying the other params in tblproperties like the distribute by 
and split rows.


Line 1748: AnalysisError("create table tab (x int primary key, primary 
key(x)) stored " +
Add negative test where two ColumnDefs are declared as PK.


Line 1762: AnalysisError("create table tab (a int, b int, c int, d int, 
primary key(a, b, c)) " +
Do we have tests somewhere for checking which types are supported with Kudu? We 
should make sure that:
* you can create a table with all supported types (and same for the specific 
clauses like primary key, distributed by, etc.)
* you cannot create tables with unsupported types like TIMESTAMP/DECIMAL and 
nested types (should fail gracefully with "not supported")
* or alternatively, if we want to defer the type checks to Kudu (and not bake 
it into Impala analysis), then we should document that somewhere


Line 1772: // No float split keys
Even if the column type is float? If so, might want to test that.


Line 1810: // DISTRIBUTE BY is required for managed tables.
Primary key is also required, do we have a test?


Line 1812: "Table partitioning must be specified for managed Kudu 
tables.");
Let's rephrase this as "Table distribution must be specified ..."


PS6, Line 1828: AnalysisError("create external table t tblproperties (" +
Not that it makes any sense, but hat happens with:

create external table t (primary key()) ...


Line 1863:   public void TestCreateAvroTest() {
Add a test with some of the Kudu-specific clauses and STORED AS AVRO


Line 2822:   public void TestShowFiles() throws AnalysisException {
DO we have a TODO/JIRA somewhere to go through all DDL/SHOW commands and make 
them behave properly for Kudu?


http://gerrit.cloudera.org:8080/#/c/4414/6/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

Line 1632
Why remove this? It will break my setup :)


Line 2235: ParsesOk("CREATE TABLE foo (i INT PRIMARY KEY, PRIMARY KEY(i)) 
STORED AS KUDU");
Add case like this:
CREATE TABLE foo (i INT PRIMARY KEY, j PRIMARY KEY) STORED AS KUDU


Line 2541: ParsesOk("CREATE TABLE Foo TBLPROPERTIES ('a'='b', 'c'='d') AS 
SELECT * from bar");
nit: let's be consistent with how we chop up string literals across lines 
(space at end of previous line xor beginning of next line)


http://gerrit.cloudera.org:8080/#/c/4414/6/fe/src/test/java/org/apache/impala/catalog/HdfsStorageDescriptorTest.java
File fe/src/test/java/org/apache/impala/catalog/HdfsStorageDescriptorTest.java:

Line 58: "org.apache.impala.hive.serde.ParquetInputFormat",
revert


http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

Line 203:   if row_format and file_format == 'text':
I think row_format is also valid for sequence files (maybe we're not using it 
though)


http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

Line 823: distribute by range(id) split rows ((1003), (1007)) stored as kudu;
ah so much better


http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
File 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test:

Line 218: create table kudu_table_clone like functional_kudu.alltypes
Why can't we check this in analysis?


http://gerrit.cloudera.org:8080/#/c/4414/6/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test:

Line 3: # This test file contains several cases for what basically amount to 
analysis errors,
Should mention that some of this behavior is intentional and why.


Line 11: ImpalaRuntimeException: Type TIMESTAMP is not supported in Kudu
Do we get the same message

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-17 Thread Dimitris Tsirogiannis (Code Review)

Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 5:

(41 comments)

http://gerrit.cloudera.org:8080/#/c/4414/5//COMMIT_MSG
Commit Message:

Line 45:"KEY" is expected to be common enough that the ident style
> Might also want to mention that "key" is used for nested map types, so it's
Done


Line 51:on the HMS database and table name. If the database is "default",
> Does this last regarding "default" make the code easier or more complicated
Not sure why the special case. Removing it unless someone objects.


http://gerrit.cloudera.org:8080/#/c/4414/5/be/src/service/frontend.cc
File be/src/service/frontend.cc:

Line 61: DEFINE_string(kudu_master_addrs, "", "Specifies the default Kudu 
master(s). The given "
> Can we make this more consistent with out existing options? For example:
Done


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
File fe/src/main/java/org/apache/impala/analysis/ColumnDef.java:

Line 176:   static List toColumnNames(Collection columnDefs) 
{
> colDefs
Done


Line 184:   static Map 
mapByColumnNames(Collection colDefs) {
> comment that colDefs is assumed have no duplicate names, or alternatively d
Done


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

Line 99:   "does not support (%s) file format. Supported formats are: 
(%s)",
> the (%s) file format.
Done


Line 100:   createStmt_.getFileFormat().toString().replace("_", ""),
> what's with this weird "_" replacement?
I believe the intention was to make the file format name from THdfsFileFormat 
consistent with the file format name as specified in a "STORED AS" clause. 
RC_FILE is RCFILE, TEXT is TEXTFILE. So, as you can tell we aren't very 
consistent here. That's also the reason for hardcoding the values in L101. Left 
a TODO in the CatalogObjects.thrift for now if that's ok with you.


Line 101:   "PARQUET, TEXTFILE, KUDU"));
> use SUPPORTED_INSERT_FORMATS instead of hardcoding
See comment above.


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

Line 80:   private void setColumnDefs(List columnDefs) {
> nit: colDefs everywhere
Done


Line 95:   Map getTblProperties() { return 
tableDef_.getTblProperties(); }
> is there any significance to not adding public/private to some of these met
Casey started using the default access modifier that essentially gives public 
access only to classes in the same package. In theory, we should be using the 
most restrictive access modifier, so I kind of like this. For sure the codebase 
is not consistent with using this rule, so if you think we shouldn't be doing 
this I'll make them public.


Line 192:   "Only Kudu tables can use DISTRIBUTE BY clause.");
> the DISTRIBUTE BY clause
Done


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

Line 43:  * Represents the table parameters in a CREATE TABLE statement.
> Let's be more precise and list the clauses that this captures.
Done


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

Line 97:   // Table name in Kudu storage engine. It may not neccessarily be the 
same as the
> the Kudu storage engine
Done


Line 101:   // 1. For managed tables, 'kuduTableName_' is prefixed with 
'__IMPALA__' to
> Can we make the prefix 'impala::' instead?
Sure no prob. Done


Line 154:* are loaded from HMS. The function also updates the table schema 
in HMS.
> Add a comment about why we also update the HMS. My understanding is that we
Done


Line 195:   msClient.alter_table(msTable_.getDbName(), 
msTable_.getTableName(), msTable_);
> This means every invalidate/refresh will do an extra alter to HMS. Are we c
Good point. We can potentially avoid this by checking if there has been some 
change in the schema. At this point, I am not so terribly worried about this. 
I'll keep a note to actually profile it and see if it's worth optimizing that 
path.


Line 202:* Loads the schema from the Kudu table including column 
definitions and primary key
> mention that it loads the schema into this table as well as the HMS table
Done


Line 216:   String colName = ToSqlUtils.getIdentSql(colSchema.getName());
> getIdentSql() may add backticks, I don't think we want that here
Hm, can you explain why this is the case given that we will use this name to 
populate the columns in the HMS table?


Line 223: numClus

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-17 Thread Dimitris Tsirogiannis (Code Review)

Dimitris Tsirogiannis has uploaded a new patch set (#7).

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..

IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

With this commit we simplify the syntax and handling of CREATE TABLE
statements for both managed and external Kudu tables.

Syntax example:
CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b))
DISTRIBUTE BY HASH (a) INTO 3 BUCKETS,
RANGE (b) SPLIT ROWS (('abc', 'def'))
STORED AS KUDU

Changes:
1) Remove the requirement to specify table properties such as key
   columns in tblproperties.
2) Read table schema (column definitions, primary keys, and distribution
   schemes) from Kudu instead of the HMS.
3) For external tables, the Kudu table is now required to exist at the
   time of creation in Impala.
4) Disallow table properties that could conflict with an existing
   table. Ex: key_columns cannot be specified.
5) Add KUDU as a file format.
6) Add a startup flag to impalad to specify the default Kudu master
   addresses. The flag is used as the default value for the table
   property kudu_master_addresses but it can still be overriden
   using TBLPROPERTIES.
7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE
   wasn't implemented for Kudu tables and silently ignored. The Kudu
   tables wouldn't be removed in Kudu.
8) Remove DDL delegates. There was only one functional delegate (for
   Kudu) the existence of the other delegate and the use of delegates in
   general has led to confusion. The Kudu delegate only exists to provide
   functionality missing from Hive.
9) Add PRIMARY KEY at the column and table level. This syntax is fairly
   standard. When used at the column level, only one column can be
   marked as a key. When used at the table level, multiple columns can
   be used as a key. Only Kudu tables are allowed to use PRIMARY KEY.
   The old "kudu.key_columns" table property is no longer accepted
   though it is still used internally. "PRIMARY" is now a keyword.
   The ident style declaration is used for "KEY" because it is also used
   for nested map types.
10) For managed tables, infer a Kudu table name if none was given.
   The table property "kudu.table_name" is optional for managed tables
   and is required for external tables. If for a managed table a Kudu
   table name is not provided, a table name will be generated based
   on the HMS database and table name.
11) Use Kudu master as the source of truth for table metadata instead
   of HMS when a table is loaded or refreshed. Table/column metadata
   are cached in the catalog and are stored in HMS in order to be
   able to use table and column statistics.

Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
---
M be/src/service/frontend.cc
M bin/start-catalogd.sh
M bin/start-impala-cluster.py
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
A fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java
D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java
D 
fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A fe/src/main/java/org/apache/impala/

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-17 Thread Michael Brown (Code Review)

Michael Brown has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 6:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/4414/6/tests/common/kudu_test_suite.py
File tests/common/kudu_test_suite.py:

PS6, Line 47: super(KuduTestSuite, cls).setup_class()
: if os.environ["KUDU_IS_SUPPORTED"] == "false":
:   pytest.skip("Kudu is not supported")
Maybe move L47 to L50 (switch the ordering of these)? This isn't just a nit: it 
avoids running super.setup_class(), which is non-trivial, and then just 
deciding to skip after all.


PS6, Line 51:   @classmethod
:   def teardown_class(cls):
: pass
Not needed:

1. KuduTestSuite.setup_class() doesn't do anything that needs to be torn down 
here.

2. This overrides super.teardown_class() and doesn't call 
super.teardown_class(). super.teardown_class() 
(ImpalaTestSuite.teardown_class()) does do some work.

So it seems safe to delete this method completely.


http://gerrit.cloudera.org:8080/#/c/4414/6/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

PS6, Line 20: from kudu.client import INT32
ImportError: I think this needs to be kudu.schema


PS6, Line 34:   @classmethod
:   def add_test_dimensions(cls):
: super(CustomClusterTestSuite, cls).add_test_dimensions()
: cls.TestMatrix.add_constraint(lambda v:
: v.get_value('table_format').file_format == 'parquet' and
: v.get_value('table_format').compression_codec == 'none')
I don't think this is needed: no tests here use a vector.


PS6, Line 43:   def test_kudu_master_addrs(self, cursor, kudu):
kudu_client, not kudu. Note this isn't just a rename; it's to make use of the 
kudu_client fixture. There is no "kudu" fixture.


PS6, Line 45: with self.temp_kudu_table(kudu, [INT32]) as kudu_table:
kudu_client


PS6, Line 50: props)
SyntaxError: missing a closing paren here


http://gerrit.cloudera.org:8080/#/c/4414/6/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

PS6, Line 27: from pprint import pprint
Remove: unused import


PS6, Line 30: from copy import copy
: 
: from tests.beeswax.impala_beeswax import ImpalaBeeswaxException
: from tests.common.impala_test_suite import ImpalaTestSuite
: from tests.common.skip import SkipIf
: from tests.common.test_dimensions import 
create_uncompressed_text_dimension
: from tests.common.test_vector import TestDimension
All imported but not used.


PS6, Line 49:   def test_kudu_scan_node(self, vector):
This test is racy in exhaustive: two tests will be competing for the same 
database (kududb_test).

You could play it safe and run it serially, but with a little bit of work, this 
test can be run in parallel if you

1. use the unique_database fixture L49

2. use_db=unique_database L52

3. Modify QueryTest/kudu-scan-node to access dimtbl using the fully qualified 
path (functional_kudu.dimtbl)

4. Remove all uses of kududb_test here and in kudu-scan-node.test

https://gerrit.cloudera.org/#/c/4169/ shows some of the same principles as an 
example.


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-14 Thread Alex Behm (Code Review)

Alex Behm has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 5:

(41 comments)

Sending out the first wave.

http://gerrit.cloudera.org:8080/#/c/4414/5//COMMIT_MSG
Commit Message:

Line 45:"KEY" is expected to be common enough that the ident style
Might also want to mention that "key" is used for nested map types, so it's not 
desirable to make it a keyword.


Line 51:on the HMS database and table name. If the database is "default",
Does this last regarding "default" make the code easier or more complicated? 
What's the motivation for special casing it, i.e., what's the harm in including 
the "default" database name?


http://gerrit.cloudera.org:8080/#/c/4414/5/be/src/service/frontend.cc
File be/src/service/frontend.cc:

Line 61: DEFINE_string(kudu_master_addrs, "", "Specifies the default Kudu 
master(s). The given "
Can we make this more consistent with out existing options? For example:

-catalog_service_host
-state_store_host


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
File fe/src/main/java/org/apache/impala/analysis/ColumnDef.java:

Line 176:   static List toColumnNames(Collection columnDefs) 
{
colDefs


Line 184:   static Map 
mapByColumnNames(Collection colDefs) {
comment that colDefs is assumed have no duplicate names, or alternatively deal 
with that case


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

Line 99:   "does not support (%s) file format. Supported formats are: 
(%s)",
the (%s) file format.


Line 100:   createStmt_.getFileFormat().toString().replace("_", ""),
what's with this weird "_" replacement?


Line 101:   "PARQUET, TEXTFILE, KUDU"));
use SUPPORTED_INSERT_FORMATS instead of hardcoding


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

Line 80:   private void setColumnDefs(List columnDefs) {
nit: colDefs everywhere


Line 95:   Map getTblProperties() { return 
tableDef_.getTblProperties(); }
is there any significance to not adding public/private to some of these methods?


Line 192:   "Only Kudu tables can use DISTRIBUTE BY clause.");
the DISTRIBUTE BY clause


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

Line 43:  * Represents the table parameters in a CREATE TABLE statement.
Let's be more precise and list the clauses that this captures.


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

Line 97:   // Table name in Kudu storage engine. It may not neccessarily be the 
same as the
the Kudu storage engine


Line 101:   // 1. For managed tables, 'kuduTableName_' is prefixed with 
'__IMPALA__' to
Can we make the prefix 'impala::' instead?

Better to keep the "impala" part lowercase for consistency.

It might be a minor detail, but keeping the name readable seems important. I'd 
imagine that Kudu tables will be accessed from several tools.


Line 154:* are loaded from HMS. The function also updates the table schema 
in HMS.
Add a comment about why we also update the HMS. My understanding is that we 
want to propagate alterations made to the Kudu table to the HMS.


Line 195:   msClient.alter_table(msTable_.getDbName(), 
msTable_.getTableName(), msTable_);
This means every invalidate/refresh will do an extra alter to HMS. Are we 
concerned about the extra load?


Line 202:* Loads the schema from the Kudu table including column 
definitions and primary key
mention that it loads the schema into this table as well as the HMS table


Line 216:   String colName = ToSqlUtils.getIdentSql(colSchema.getName());
getIdentSql() may add backticks, I don't think we want that here


Line 223: numClusteringCols_ = primaryKeyColumnNames_.size();
Check that there is at least one PK column


Line 228: PartitionSchema partitionSchema = kuduTable.getPartitionSchema();
Preconditions.checkState(!colsByPos_.isEmpty()); ?


Line 254:   public static KuduTable createTmpTable(Db db,
createCtasTarget()? Seems more explicit since there really is only one use.


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 1117: if (db != null) dropTablesFromKudu(db);
Won't this drop all Kudu tables even if CASCADE is not specified?


Line 1151: // the metadata must be fetched from the Hive Metastore.
Why do we

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-14 Thread Dimitris Tsirogiannis (Code Review)

Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 5:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/4414/5/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

PS5, Line 73: parser.add_option("--kudu_masters", default="127.0.0.1",
> Could we import the default value from tests.common instead of hardcoding i
Good point. Done


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java:

Line 129:   if 
(!org.apache.impala.catalog.Type.areAssignmentCompatibleTypes(colType,
> you want to be able to assign the split value to the column type with just 
Done


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

PS4, Line 226:   private void 
loadDistributeByParams(org.apache.kudu.client.KuduTable kuduTable) {
 : Preconditions
> yes, please leave a comment that you're resetting the mstable cols if there
Done


PS4, Line 226:   private void 
loadDistributeByParams(org.apache.kudu.client.KuduTable kuduTable) {
 : Preconditions
> I see. yes, a brief comment would be helpful.
Done


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

Line 157:   public void load(boolean reuseMetadata /* not used */, 
IMetaStoreClient msClient,
> rename reuseMetadata to dummy
Good point. Done


PS5, Line 193:   
msTable_.putToParameters(StatsSetupConst.DO_NOT_UPDATE_STATS,
 :   StatsSetupConst.TRUE);
> This also needs to be stripped out later for SHOW CREATE TABLE.
Done


Line 217:   cols.add(new FieldSchema(colName, type.toSql().toLowerCase(), 
null));
> this also updates msTable_
Added a comment to clarify it. Currently, we need to keep the schema in HMS 
consistent with the one in Kudu in order to use the column statistics. Once we 
move away from this and Kudu provides us with proper statistics API we will get 
rid of this annoying dance around msTable_.


Line 295: TDistributeByRangeParam rangeParam = 
param.getBy_range_param();
> checkstate(isset...)
Done


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS5, Line 1484: // Add the table to the catalog cache
  : Table newTbl = catalog_.addTable(newTable.getDbName(), 
newTable.getTableName());
  : addTableToCatalogUpdate(newTbl, response.result);
> If something here throws (e.g. addTable() looks like it could throw), we pr
Actually, addTable should not be declared to throw because none of its 
operations throw an exception. addTableToCatalogUpdate also doesn't throw, so I 
believe it's safe to assume that we don't need to rollback the hms change here. 
Modified the addTable() function in CatalogServiceCatalog.


http://gerrit.cloudera.org:8080/#/c/4414/5/tests/common/kudu_test_suite.py
File tests/common/kudu_test_suite.py:

PS5, Line 38: @SkipIf.kudu_not_supported
> This is extremely risky due to bugs in our version of old pytest. See IMPAL
Done. Let me know is that's ok now.


http://gerrit.cloudera.org:8080/#/c/4414/5/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

PS5, Line 1: # Copyright (c) 2016 Cloudera, Inc. All rights reserved.
> Remove Cloudera copyright and make sure the license text is correct otherwi
Done


http://gerrit.cloudera.org:8080/#/c/4414/5/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

PS5, Line 220: 
> Why remove this skip?
Hm, that wasn't intentional. I put it back.


PS5, Line 221: 
> Is this removal safe?
Same here.


PS5, Line 221: self.expected_exceptions = 2
> I can't see where this is used; delete?
Done


http://gerrit.cloudera.org:8080/#/c/4414/5/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

PS5, Line 87: print("Describe formatted output:")
: pprint(table_desc)
> Use LOG.info()
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-14 Thread Dimitris Tsirogiannis (Code Review)

Dimitris Tsirogiannis has uploaded a new patch set (#6).

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..

IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

With this commit we simplify the syntax and handling of CREATE TABLE
statements for both managed and external Kudu tables.

Syntax example:
CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b))
DISTRIBUTE BY HASH (a) INTO 3 BUCKETS,
RANGE (b) SPLIT ROWS (('abc', 'def'))
STORED AS KUDU

Changes:
1) Remove the requirement to specify table properties such as key
   columns in tblproperties.
2) Read table schema (column definitions, primary keys, and distribution
   schemes) from Kudu instead of the HMS.
3) For external tables, the Kudu table is now required to exist at the
   time of creation in Impala.
4) Disallow table properties that could conflict with an existing
   table. Ex: key_columns cannot be specified.
5) Add KUDU as a file format.
6) Add a startup flag to impalad to specify the default Kudu master
   addresses. The flag is used as the default value for the table
   property kudu_master_addresses but it can still be overriden
   using TBLPROPERTIES.
7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE
   wasn't implemented for Kudu tables and silently ignored. The Kudu
   tables wouldn't be removed in Kudu.
8) Remove DDL delegates. There was only one functional delegate (for
   Kudu) the existence of the other delegate and the use of delegates in
   general has led to confusion. The Kudu delegate only exists to provide
   functionality missing from Hive.
9) Add PRIMARY KEY at the column and table level. This syntax is fairly
   standard. When used at the column level, only one column can be
   marked as a key. When used at the table level, multiple columns can
   be used as a key. Only Kudu tables are allowed to use PRIMARY KEY.
   The old "kudu.key_columns" table property is no longer accepted
   though it is still used internally. "PRIMARY" is now a keyword.
   "KEY" is expected to be common enough that the ident style
   declaration is used instead to avoid conflicts.
10) For managed tables, infer a Kudu table name if none was given.
   The table property "kudu.table_name" is optional for managed tables
   and is required for external tables. If for a managed table a Kudu
   table name is not provided, a table name will be generated based
   on the HMS database and table name. If the database is "default",
   then a Kudu table name will not include the database.
11) Use Kudu master as the source of truth for table metadata instead
   of HMS when a table is loaded or refreshed. Table/column metadata
   are cached in the catalog and are stored in HMS in order to be
   able to use table and column statistics.

Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
---
M be/src/service/frontend.cc
M bin/start-catalogd.sh
M bin/start-impala-cluster.py
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
A fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java
D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java
D 
fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.jav

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-14 Thread Matthew Jacobs (Code Review)

Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

PS5, Line 193:   
msTable_.putToParameters(StatsSetupConst.DO_NOT_UPDATE_STATS,
 :   StatsSetupConst.TRUE);
This also needs to be stripped out later for SHOW CREATE TABLE.


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-12 Thread Marcel Kornacker (Code Review)

Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java:

Line 129:   if 
(!org.apache.impala.catalog.Type.areAssignmentCompatibleTypes(colType,
> Done
you want to be able to assign the split value to the column type with just an 
implicit cast, if any. in other words, for a bigint col it's okay to have a 
split value of 127, but for a tinyint col, you can't have a split value of 
maxint.

you can call isImplicitlyCastable(exprType, colType).


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

PS4, Line 226:   private void 
loadDistributeByParams(org.apache.kudu.client.KuduTable kuduTable) {
 : Preconditions
> cols is a reference to msTable cols. We clear them here and reload them fro
yes, please leave a comment that you're resetting the mstable cols if there are 
any. i tripped over that as well.


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

Line 157:   public void load(boolean reuseMetadata /* not used */, 
IMetaStoreClient msClient,
rename reuseMetadata to dummy


Line 217:   cols.add(new FieldSchema(colName, type.toSql().toLowerCase(), 
null));
this also updates msTable_

document this side effect in the function comment. this feels pretty convoluted 
and is hard to follow. is this really necessary or should we simply ignore the 
column info stored in mstable?


Line 295: TDistributeByRangeParam rangeParam = 
param.getBy_range_param();
checkstate(isset...)


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-12 Thread Matthew Jacobs (Code Review)

Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

PS4, Line 175: }
> Done
While it does seem related, does it really mean the same thing?

Either way, we have to be sure we're handling it correctly everywhere. E.g. 
does  Table.getClusteringColumns(), getNonClusteringColumns(), work as expected?
They also get stored in the thrift TTable differently, see Table.java:


  // Clustering columns come first.
  if (i < numClusteringCols_) {
table.addToClustering_columns(colDesc);
  } else {
table.addToColumns(colDesc);
  }


Further, Table.getNumClusteringCols() exposes this and it is used in a lot of 
places.

E.g. TupleDescriptor.java:
 
  public boolean hasClusteringColsOnly() {
Table table = getTable();
if (!(table instanceof HdfsTable) || table.getNumClusteringCols() == 0) 
return false;
...


There are a ton of other places to check.


PS4, Line 226:   private void 
loadDistributeByParams(org.apache.kudu.client.KuduTable kuduTable) {
 : Preconditions
> cols is a reference to msTable cols. We clear them here and reload them fro
I see. yes, a brief comment would be helpful.


http://gerrit.cloudera.org:8080/#/c/4414/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS5, Line 1484: // Add the table to the catalog cache
  : Table newTbl = catalog_.addTable(newTable.getDbName(), 
newTable.getTableName());
  : addTableToCatalogUpdate(newTbl, response.result);
If something here throws (e.g. addTable() looks like it could throw), we 
probably need to roll back the hiveClient.createTable() call, right?


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java:

PS4, Line 135:   tableOpts.setRangePartitionColumns(
 :   distParam.getBy_range_param().getColumns());
 :
> Unfortunately it is. I spoke to Dan (from Kudu team) about it. If the user 
Thanks, I followed up with an e-mail to Dan to make sure I understand the Kudu 
behavior.


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-12 Thread Michael Brown (Code Review)

Michael Brown has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4414/5/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

PS5, Line 73: parser.add_option("--kudu_masters", default="127.0.0.1",
Could we import the default value from tests.common instead of hardcoding it?


http://gerrit.cloudera.org:8080/#/c/4414/5/tests/common/kudu_test_suite.py
File tests/common/kudu_test_suite.py:

PS5, Line 38: @SkipIf.kudu_not_supported
This is extremely risky due to bugs in our version of old pytest. See 
IMPALA-3614 for an example of how this is risky. I think you will have to work 
around this by skipping in setup_class.


http://gerrit.cloudera.org:8080/#/c/4414/5/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

PS5, Line 1: # Copyright (c) 2016 Cloudera, Inc. All rights reserved.
Remove Cloudera copyright and make sure the license text is correct otherwise.


http://gerrit.cloudera.org:8080/#/c/4414/5/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

PS5, Line 220: 
Why remove this skip?


PS5, Line 221: 
Is this removal safe?


PS5, Line 221: self.expected_exceptions = 2
I can't see where this is used; delete?


http://gerrit.cloudera.org:8080/#/c/4414/5/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

PS5, Line 87: print("Describe formatted output:")
: pprint(table_desc)
Use LOG.info()


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-11 Thread Dimitris Tsirogiannis (Code Review)

Dimitris Tsirogiannis has uploaded a new patch set (#5).

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..

IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

With this commit we simplify the syntax and handling of CREATE TABLE
statements for both managed and external Kudu tables.

Syntax example:
CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b))
DISTRIBUTE BY HASH (a) INTO 3 BUCKETS,
RANGE (b) SPLIT ROWS (('abc', 'def'))
STORED AS KUDU

Changes:
1) Remove the requirement to specify table properties such as key
   columns in tblproperties.
2) Read table schema (column definitions, primary keys, and distribution
   schemes) from Kudu instead of the HMS.
3) For external tables, the Kudu table is now required to exist at the
   time of creation in Impala.
4) Disallow table properties that could conflict with an existing
   table. Ex: key_columns cannot be specified.
5) Add KUDU as a file format.
6) Add a startup flag to impalad to specify the default Kudu master
   addresses. The flag is used as the default value for the table
   property kudu_master_addresses but it can still be overriden
   using TBLPROPERTIES.
7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE
   wasn't implemented for Kudu tables and silently ignored. The Kudu
   tables wouldn't be removed in Kudu.
8) Remove DDL delegates. There was only one functional delegate (for
   Kudu) the existence of the other delegate and the use of delegates in
   general has led to confusion. The Kudu delegate only exists to provide
   functionality missing from Hive.
9) Add PRIMARY KEY at the column and table level. This syntax is fairly
   standard. When used at the column level, only one column can be
   marked as a key. When used at the table level, multiple columns can
   be used as a key. Only Kudu tables are allowed to use PRIMARY KEY.
   The old "kudu.key_columns" table property is no longer accepted
   though it is still used internally. "PRIMARY" is now a keyword.
   "KEY" is expected to be common enough that the ident style
   declaration is used instead to avoid conflicts.
10) For managed tables, infer a Kudu table name if none was given.
   The table property "kudu.table_name" is optional for managed tables
   and is required for external tables. If for a managed table a Kudu
   table name is not provided, a table name will be generated based
   on the HMS database and table name. If the database is "default",
   then a Kudu table name will not include the database.
11) Use Kudu master as the source of truth for table metadata instead
   of HMS when a table is loaded or refreshed. Table/column metadata
   are cached in the catalog and are stored in HMS in order to be
   able to use table and column statistics.

Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
---
M be/src/service/frontend.cc
M bin/start-catalogd.sh
M bin/start-impala-cluster.py
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
A fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java
D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java
D 
fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.jav

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-11 Thread Dimitris Tsirogiannis (Code Review)

Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 4:

(54 comments)

http://gerrit.cloudera.org:8080/#/c/4414/4/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 53: enum THdfsFileFormat {
> rename
This change would touch many places. Would you mind postponing it for a follow 
up patch?


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

Line 976:   tbl_def_without_col_defs:tbl_def
> a 'create table' without col defs?
This was added for the EXTERNAL Kudu table use case for which no column 
definitions are specified since we load the schema from the Kudu table. Added a 
comment.


Line 980: RESULT = new CreateTableStmt(tbl_def); 
> trailing
Done


Line 1033: // class doesn't inherit from CreateTableStmt.
> should it?
To my opinion yes it should. I actually went down the path of refactoring all 
the CREATE TABLE* statements but it ended up being too complex to add on top of 
this big patch. Simplifying the CREATE TABLE statements will also allow us to 
remove some of the weird table option handling we do in TableDef.java. I will 
leave a TODO for now.


Line 1065: primary_keys_val ::=
> opt_primary_keys?
Done


Line 1089: tbl_data_layout ::=
> opt_...?
Done


Line 1139:   {: 
> fix spaces and tabs
Done


Line 1370:   KW_PRIMARY key_ident
> what's wrong with KW_KEY?
I don't think we can do that. Don't we use "key" for nested types (map)?


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java
File fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java:

Line 30:   static void throwIfNotNullOrNotEmpty(Collection c, String message)
> this is actually 'not null *and* not empty'. you can also phrase that as 'n
Good point. Done


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java:

Line 158: if (fileFormat_ == THdfsFileFormat.KUDU) {
> check at top
Done


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

Line 236: String.format("PRIMARY KEY must be used instead of the table 
property '%s'.",
> not good: if you do that on an external table, you get this error message i
I like that idea but it's a bit more complicated for Kudu because different 
properties are valid depending on whether it's an external or managed table. 
Moved the check to the function below. Let me know if that works ok.


Line 310:   distributeParam.setPKColumnDefMap(pkColumnDefsByName);
> setPkColumn...
Done


Line 315:   private boolean hasPrimaryKeysSpecified() {
> hasPrimaryKeySpecified (there's only one, which can be a composite key)
Done


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java:

Line 121:   org.apache.impala.catalog.Type colType = colDef.getType();
> does simply Type conflict with something?
Yeah, there is a conflict with the enum Type in this class.


Line 129:   if (colType.isStringType() && !exprType.isStringType()
> this is basically looking for 'assignment compatible', and i'm sure we alre
Done


Line 150:   builder.append(numBuckets_).append(" BUCKETS");
> sprinkle some checkstates in here (on numbuckets and splitrows; or maybe a 
I added checks for numBuckets_. Split rows will go away in a follow up patch 
with the new range partitioning syntax. I left a TODO to add a validate 
function then.


Line 200: literal.setString_literal(expr.getStringValue());
> checkstate that you're getting something valid
Done


Line 211:   void setPKColumnDefMap(Map pkColumnDefByName) {
> setPkC...
Done


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

Line 74:   static class TableDefOptions {
> 'Options' is enough
Done


Line 160: fullyQualifiedTableName_ = analyzer.getFqTableName(getTblName());
> stick with fq abbreviation?
Done


Line 189: for (ColumnDef colDef: getPartitionColumnDefs()) {
> this is a bit hard to follow. partition cols aren't defined separately, the
These are the columns specified in a PARTITIONED BY clause (non-kudu) and they 
should be analyzed, no? Sorry, I am not sure I follow your comment.


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
File fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java:

PS4, Line 97: org.apache
> should this have changed?
Hm sed is not that s

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-11 Thread Michael Brown (Code Review)

Michael Brown has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4414/1/tests/common/kudu_test_suite.py
File tests/common/kudu_test_suite.py:

PS1, Line 54:   def auto_create_db(cls):
: return True
: 
:   @classmethod
:   def get_db_name(cls):
: # When py.test runs with the xdist plugin, several processes 
are started and each
: # process runs some partition of the tests. It's possible 
that multiple processes
: # will call this method. A random value is generated so the 
processes won't try
: # to use the same database at the same time. The value is 
cached so within a single
: # process the same database name is always used for the 
class. This doesn't need to
: # be thread-safe since multi-threading is never used.
: if not cls.__DB_NAME
> Duno. I'll check with Michael.
To answer this: It looks like this provides the same functionality as 
unique_database, but over a different channel (the conn/cursor fixtures in 
conftest.py). The design of KuduTestSuite and dependents are so different than 
most of our test cases. While switching to unique_database can be done, it will 
be nontrivial, and would require a bit of a change in test design.


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-11 Thread Marcel Kornacker (Code Review)

Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 4:

(37 comments)

http://gerrit.cloudera.org:8080/#/c/4414/4/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 53: enum THdfsFileFormat {
rename


Line 63: // rename this enum to not be Hdfs specific.
resolve


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

Line 976:   tbl_def_without_col_defs:tbl_def
a 'create table' without col defs?


Line 980: RESULT = new CreateTableStmt(tbl_def); 
trailing


Line 1033: // class doesn't inherit from CreateTableStmt.
should it?


Line 1065: primary_keys_val ::=
opt_primary_keys?


Line 1089: tbl_data_layout ::=
opt_...?


Line 1139:   {: 
fix spaces and tabs


Line 1370:   KW_PRIMARY key_ident
what's wrong with KW_KEY?


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java
File fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java:

Line 30:   static void throwIfNotNullOrNotEmpty(Collection c, String message)
this is actually 'not null *and* not empty'. you can also phrase that as 'not 
empty'.


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java:

Line 158: if (fileFormat_ == THdfsFileFormat.KUDU) {
check at top


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

Line 236: String.format("PRIMARY KEY must be used instead of the table 
property '%s'.",
not good: if you do that on an external table, you get this error message 
instead of 'primary key not allowed'. move this check into the next function.

alternatively, why not have a list of valid table properties per storage 
format, and then flag everything else as invalid?


Line 310:   distributeParam.setPKColumnDefMap(pkColumnDefsByName);
setPkColumn...


Line 315:   private boolean hasPrimaryKeysSpecified() {
hasPrimaryKeySpecified (there's only one, which can be a composite key)

hasPrimaryKey would express the same

when is it legal to call this (before/after analyze()?)


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java:

Line 121:   org.apache.impala.catalog.Type colType = colDef.getType();
does simply Type conflict with something?


Line 129:   if (colType.isStringType() && !exprType.isStringType()
this is basically looking for 'assignment compatible', and i'm sure we already 
have code somewhere to express that more succinctly.


Line 150:   builder.append(numBuckets_).append(" BUCKETS");
sprinkle some checkstates in here (on numbuckets and splitrows; or maybe a 
validate() function that does those checks, toThrift() would also benefit from 
it)


Line 200: literal.setString_literal(expr.getStringValue());
checkstate that you're getting something valid


Line 211:   void setPKColumnDefMap(Map pkColumnDefByName) {
setPkC...


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

Line 74:   static class TableDefOptions {
'Options' is enough


Line 84: // File format of the table
since this is clearly not a file format anymore, TStorageFormat?


Line 160: fullyQualifiedTableName_ = analyzer.getFqTableName(getTblName());
stick with fq abbreviation?


Line 189: for (ColumnDef colDef: getPartitionColumnDefs()) {
this is a bit hard to follow. partition cols aren't defined separately, they're 
declared. so then why do you need to call colDef.analyze()?


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

Line 111:   // Distribution schemes of this Kudu table. Both rang and 
hash-based distributions are
range


Line 140: return msTbl.getParameters().get(KuduTable.KEY_TABLE_NAME);
i don't think this is worth a function call, it just makes the code harder to 
follow (extra level of indirection)


PS4, Line 175: numClusteringCols_ = 0;
> not really related to this change, but it's kind of confusing to have numCl
those should be the primary key cols


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 1353: "functional.alltypestiny", "Columns cannot be specified with 
an external " +
odd error message. i would expect the 'as select' to be

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-10 Thread Matthew Jacobs (Code Review)

Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 4:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
File fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java:

PS4, Line 97: org.apache
should this have changed?


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

PS4, Line 89: com.cloudera
can you add a ref to IMPALA-4271 ?


PS4, Line 160: know
known


PS4, Line 159:   /**
 :* The number of nodes is not know ahead of time and will be 
updated during computeStats
 :* in the scan node.
 :*/
 :   public int getNumNodes() { return -1; }
I don't see this used


PS4, Line 175: numClusteringCols_ = 0;
not really related to this change, but it's kind of confusing to have 
numClusteringCols_


PS4, Line 226: List cols = msTable_.getSd().getCols();
 : cols.clear();
why do we get cols from getCols() and then clear() it?


PS4, Line 232: cols.add(new FieldSchema(colName, type.toSql().toLowerCase(), 
null));
why do we do this? cols isn't used later


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

PS4, Line 460: msTbl.getTableType().equals(TableType.EXTERNAL_TABLE.toString());
we shuold probably compare case insensitive to be safe


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS4, Line 1147: occurrs
occurs


PS4, Line 1482:   } catch (Exception e) {
  : try {
  :   // Error creating the table in HMS, drop the managed 
table from Kudu.
  :   if (!Table.isExternalTable(newTable)) {
  : KuduCatalogOpExecutor.dropTable(newTable, false);
  :   }
  : } catch (Exception logged) {
  :   String kuduTableName = 
newTable.getParameters().get(KuduTable.KEY_TABLE_NAME);
  :   LOG.error(String.format("Failed to drop Kudu table 
'%s'", kuduTableName),
  :   logged);
  :   throw new RuntimeException(String.format("Failed to 
create the table '%s' in " +
  :   " the Metastore and the newly created Kudu table 
'%s' could not be " +
  :   " dropped. The log contains more information.", 
newTable.getTableName(),
  :   kuduTableName), e);
  : }
  : if (e instanceof AlreadyExistsException && 
params.if_not_exists) return false;
  : throw new ImpalaRuntimeException(
  : String.format(HMS_RPC_ERROR_FORMAT_STR, 
"createTable"), e);
it looks like none of this really needs to be inside the synchronized block. 
Most of it doesn't matter but dropTable creates a KuduClient and makes an RPC. 
I could imagine this being a problem if Kudu is overwhelmed. You could catch 
the exception outside of the synchronized block, maybe some extra handling to 
know if it was created in HMS or not. Though from the looks of this we assume 
no exceptions from l1502 and 1503.


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

PS4, Line 232: if (!req.is_delta) {
 :   catalog = new ImpaladCatalog(defaultKuduMasterAddrs_);
 : }
1line


http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java:

PS4, Line 135:   if (!hasRangePartitioning) {
 : 
tableOpts.setRangePartitionColumns(Collections.emptyList());
 :   }
I don't think this is necessary


PS4, Line 175: erros
errors


PS4, Line 192: cols.clear();
can you indicate in the comment that this doesn't just populate msTbl's cols, 
it replaces whatever was there.


PS4, Line 206: new KuduClient
I'm not crazy about this wrapper class thing. It's only used in this file.


PS4, Line 212: is accessible
exists


PS4, Line 215: validateTblProperties
how about validateKuduTblExists ?


PS4, Line 224: Error accessing table in Kudu " +
 :   "master '%s'
This could also print the name. Also to avoid confusing with potential future 
auth rules, can this say something like:

"Kudu table TABLENAME on master MASTERADDRS does not exist."


http://gerrit.cloudera.

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-10 Thread Dimitris Tsirogiannis (Code Review)

Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4414/3/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

PS3, Line 350: // Parameters needed for hash distribution
 : struct TDistributeByHashParam {
 :   1: required list columns
 :   2: required i32 num_buckets
 : }
 : 
 : struct TRangeLiteral {
 :   1: optional i64 int_literal
 :   2: optional string string_literal
 : }
 : 
 : struct TRangeLiteralList {
 :   1: required list values
 : }
 : 
 : // A range distribution is identified by a list of columns and a 
series of split rows.
 : struct TDistributeByRangeParam {
 :   1: required list columns
 :   2: optional list split_rows;
 : }
 : 
 : // Parameters for the DISTRIBUTE BY clause. The actual 
distribution is identified by
 : // the type parameter.
 : struct TDistributeParam {
 :   // Set if type is set to HASH
 :   1: optional TDistributeByHashParam by_hash_param;
 : 
 :   // Set if type is set to RANGE
 :   2: optional TDistributeByRangeParam by_range_param;
 : }
> I don't see why this needs to be in the TKuduTable though. I think we just 
I consider the distribution scheme to be part of table metadata that are loaded 
in the catalog. Hence, they live in the KuduTable object and need to be 
serialized before they get sent to all the impalad nodes. Currently, the 
distribution schemes are only used for the SHOW CREATE TABLE statement. They 
should always be set in a TKuduTable. The part that is not set if the list of 
split values because they are not available from the Kudu API.


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-10 Thread Matthew Jacobs (Code Review)

Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4414/3/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

PS3, Line 350: // Parameters needed for hash distribution
 : struct TDistributeByHashParam {
 :   1: required list columns
 :   2: required i32 num_buckets
 : }
 : 
 : struct TRangeLiteral {
 :   1: optional i64 int_literal
 :   2: optional string string_literal
 : }
 : 
 : struct TRangeLiteralList {
 :   1: required list values
 : }
 : 
 : // A range distribution is identified by a list of columns and a 
series of split rows.
 : struct TDistributeByRangeParam {
 :   1: required list columns
 :   2: optional list split_rows;
 : }
 : 
 : // Parameters for the DISTRIBUTE BY clause. The actual 
distribution is identified by
 : // the type parameter.
 : struct TDistributeParam {
 :   // Set if type is set to HASH
 :   1: optional TDistributeByHashParam by_hash_param;
 : 
 :   // Set if type is set to RANGE
 :   2: optional TDistributeByRangeParam by_range_param;
 : }
> We store the distribute by params in the KuduTable class which is then seri
I don't see why this needs to be in the TKuduTable though. I think we just need 
this in the TCreateTableParams. I wouldn't mind except it's more work to reason 
about when it's set/not set in a TKuduTable.


PS3, Line 390: 
 :   // Distribution schemes
 :   4: required list distribute_by
> See my comment above. We do load and serialize the distribute by params in 
I don't see why this needs to be in TKuduTable, only TCreateTableParams.


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java:

PS3, Line 156:   if (splitRows_ == null) {
 : builder.append("...");
> This is possible because an object of DistributeParam class is created in t
In case #2 why do we even bother creating a DistributeParams? 

The partitioning/distribution info only really makes sense at the time of table 
creation I think, right? This goes back to our conversation (in person) about 
how it'd probably make sense to separate the partitioning part of the command 
from CREATE TABLE, and do partitioning more like how we would for non-Kudu 
tables, i.e. separate commands. Obviously that hasn't been worked through and 
realistically we probably won't get to that in time.

If we don't get to something like that, I think the only reason we'd want #2 
(in terms of loading DistributeParams) is SHOW CREATE TABLE, so this class only 
needs to support toSql() in case #2. Is that right? Can you add a comment in 
the class header or the relevant constructor where we might get case #2 so it's 
easier to reason about?


A TODO / JIRA ref about resolving the '...' would be good. It might be good to 
add a test case with this output if there isn't one already to make sure we fix 
it properly and/or it doesn't change in some other unexpected way.


PS3, Line 191:   if (splitRows_ == null) {
 : result.setBy_range_param(rangeParam);
 : return result;
 :   }
> See explanation above.
I guess it's confusing to me because TDistributeParam doesn't seem to be 
necessary in case #2.

In case #1 it gets added to a TCreateTableStmt object. In case #2 it gets added 
to a TKuduTable object, but it seems unnecessary given that no other backend or 
even the catalog care about the distribute params - right? If that's true, I 
think we could avoid storing this in KuduTable, then avoid calling this in 
KuduTable.getTKuduTable(), and then throwing here if splitRows_ is null since I 
don't think this should be called in this case.


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

PS3, Line 126: return fullyQualifiedTableName_ != null ? 
fullyQualifiedTableName_ : tableName_;
> We do call this function before tableName_ has been analyzed. One example i
ah ok, can you add a comment?


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java:

PS3, Line 243: List primaryKeysSql, String kuduDistributeByParams,
can you update the comment, especially around which of these should/can be null 
vs. empty.

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-10 Thread Dimitris Tsirogiannis (Code Review)

Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 3:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/4414/3/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

PS3, Line 344: // Enum listing all possible DISTRIBUTE BY types
 : enum TDistributeType {
 :   HASH,
 :   RANGE,
> I don't see this used
Correct. Removed.


PS3, Line 350: // Parameters needed for hash distribution
 : struct TDistributeByHashParam {
 :   1: required list columns
 :   2: required i32 num_buckets
 : }
 : 
 : struct TRangeLiteral {
 :   1: optional i64 int_literal
 :   2: optional string string_literal
 : }
 : 
 : struct TRangeLiteralList {
 :   1: required list values
 : }
 : 
 : // A range distribution is identified by a list of columns and a 
series of split rows.
 : struct TDistributeByRangeParam {
 :   1: required list columns
 :   2: optional list split_rows;
 : }
 : 
 : // Parameters for the DISTRIBUTE BY clause. The actual 
distribution is identified by
 : // the type parameter.
 : struct TDistributeParam {
 :   // Set if type is set to HASH
 :   1: optional TDistributeByHashParam by_hash_param;
 : 
 :   // Set if type is set to RANGE
 :   2: optional TDistributeByRangeParam by_range_param;
 : }
> I see that these are used in the serialized catalog objects, but given that
We store the distribute by params in the KuduTable class which is then 
serialized and sent to al the impalads. So, I think we do need them in this 
case, unless I misunderstood your comment.


PS3, Line 390: 
 :   // Distribution schemes
 :   4: required list distribute_by
> same for this, can we get rid of it?
See my comment above. We do load and serialize the distribute by params in the 
KuduTable class.


http://gerrit.cloudera.org:8080/#/c/4414/3/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

PS3, Line 398: CatalogObjects.TDistributeParam
> wrt my comments in CatalogObjects, I guess we'd need them here, but I don't
See my responses to your comments and let me know if it's still not clear.


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/ColumnDefOptions.java
File fe/src/main/java/org/apache/impala/analysis/ColumnDefOptions.java:

PS3, Line 21:  * Placeholder for the column definition options of a CREATE 
TABLE statement.
:  * Contains the list of column definitions and, optionally, the 
list of column names
:  * specified using the PRIMARY KEY keyword.
> this feels like a weird abstraction... do we really need this class?
This was used only during parsing to store the "(col type, col type, PRIMARY 
KEY (col, ..))" part of the create table stmt. I modified the parser so that 
this isn't needed anymore. Done


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

Line 87:   InsertStmt getInsertStmt() { return insertStmt_; }
> why change visibility?
Done


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java:

PS3, Line 344: getTblPrimaryKeyColumnNames(),  null,
> Why would there be PK col names?
Done


PS3, Line 344:   
> nit extra space
Done


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

PS3, Line 167: if (getFileFormat() != THdfsFileFormat.KUDU && 
KuduTable.KUDU_STORAGE_HANDLER.equals(
 : getTblProperties().get(KuduTable.KEY_STORAGE_HANDLER))) {
 :   throw new 
AnalysisException(KUDU_STORAGE_HANDLER_ERROR_MESSAGE);
 : }
 : 
 : // Avro tables can have empty column defs because they can 
infer them from the Avro
 : // schema. Likewise for external Kudu tables, the schema can 
be read from Kudu.
 : if (getColumnDefs().isEmpty() && getFileFormat() != 
THdfsFileFormat.AVRO
 : && getFileFormat() != THdfsFileFormat.KUDU) {
 :   throw new AnalysisException("Table requires at least 1 
column");
 : }
 : 
 : if (getFileFormat() == THdfsFileFormat.KUDU) {
 :   ana

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-10 Thread Dimitris Tsirogiannis (Code Review)

Dimitris Tsirogiannis has uploaded a new patch set (#4).

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..

IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

With this commit we simplify the syntax and handling of CREATE TABLE
statements for both managed and external Kudu tables.

Syntax example:
CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b))
DISTRIBUTE BY HASH (a) INTO 3 BUCKETS,
RANGE (b) SPLIT ROWS (('abc', 'def'))
STORED AS KUDU

Changes:
1) Remove the requirement to specify table properties such as key
   columns in tblproperties.
2) Read table schema (column definitions, primary keys, and distribution
   schemes) from Kudu instead of the HMS.
3) For external tables, the Kudu table is now required to exist at the
   time of creation in Impala.
4) Disallow table properties that could conflict with an existing
   table. Ex: key_columns cannot be specified.
5) Add KUDU as a file format.
6) Add a startup flag to impalad to specify the default Kudu master
   addresses. The flag is used as the default value for the table
   property kudu_master_addresses but it can still be overriden
   using TBLPROPERTIES.
7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE
   wasn't implemented for Kudu tables and silently ignored. The Kudu
   tables wouldn't be removed in Kudu.
8) Remove DDL delegates. There was only one functional delegate (for
   Kudu) the existence of the other delegate and the use of delegates in
   general has led to confusion. The Kudu delegate only exists to provide
   functionality missing from Hive.
9) Add PRIMARY KEY at the column and table level. This syntax is fairly
   standard. When used at the column level, only one column can be
   marked as a key. When used at the table level, multiple columns can
   be used as a key. Only Kudu tables are allowed to use PRIMARY KEY.
   The old "kudu.key_columns" table property is no longer accepted
   though it is still used internally. "PRIMARY" is now a keyword.
   "KEY" is expected to be common enough that the ident style
   declaration is used instead to avoid conflicts.
10) For managed tables, infer a Kudu table name if none was given.
   The table property "kudu.table_name" is optional for managed tables
   and is required for external tables. If for a managed table a Kudu
   table name is not provided, a table name will be generated based
   on the HMS database and table name. If the database is "default",
   then a Kudu table name will not include the database.
11) Use Kudu master as the source of truth for table metadata instead
   of HMS when a table is loaded or refreshed. Table/column metadata
   are cached in the catalog and are stored in HMS in order to be
   able to use table and column statistics.

Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
---
M be/src/service/frontend.cc
M bin/start-catalogd.sh
M bin/start-impala-cluster.py
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
A fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java
D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java
D 
fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
A fe/src/main/java/org/apache/impala/util/KuduClient.

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-05 Thread Matthew Jacobs (Code Review)

Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 3:

(14 comments)

next batch... I'm probably about 1/2 way through.

http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java:

PS1, Line 275: 
 : 
 : 
 : 
 : 
 : 
 : 
 : 
> Actually, I don't find it such a terrible idea. The sooner we figure out in
It's just more code to support and test...


http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java
File fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java:

PS1, Line 138: 
 : 
 : 
> There is a restriction on the types of primary keys (no bool, float or doub
Thanks, makes sense


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

Line 87:   InsertStmt getInsertStmt() { return insertStmt_; }
why change visibility?


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java:

PS3, Line 344:   
nit extra space


PS3, Line 344: getTblPrimaryKeyColumnNames(),  null,
Why would there be PK col names?

Can you add a neg test case to AnalyzeDDLTest.java?

AnalysisError("create table newkudutbl like kudu 
'/test-warehouse/schemas/alltypestiny.parquet'")


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

PS3, Line 167: if (getFileFormat() != THdfsFileFormat.KUDU && 
KuduTable.KUDU_STORAGE_HANDLER.equals(
 : getTblProperties().get(KuduTable.KEY_STORAGE_HANDLER))) {
 :   throw new 
AnalysisException(KUDU_STORAGE_HANDLER_ERROR_MESSAGE);
 : }
 : 
 : // Avro tables can have empty column defs because they can 
infer them from the Avro
 : // schema. Likewise for external Kudu tables, the schema can 
be read from Kudu.
 : if (getColumnDefs().isEmpty() && getFileFormat() != 
THdfsFileFormat.AVRO
 : && getFileFormat() != THdfsFileFormat.KUDU) {
 :   throw new AnalysisException("Table requires at least 1 
column");
 : }
 : 
 : if (getFileFormat() == THdfsFileFormat.KUDU) {
 :   analyzeKuduFormat(analyzer);
 : } else {
 :   
AnalysisUtils.throwIfNotNullOrNotEmpty(getDistributeParams(),
 :   "Only Kudu tables can use DISTRIBUTE BY clause.");
 :   throwIfPrimaryKeysWereSpecified("Only Kudu tables can 
specify a PRIMARY KEY.");
 : }
 : 
 : if (getFileFormat() == THdfsFileFormat.AVRO) {
 :   setColumnDefs(analyzeAvroSchema(analyzer));
 :   if (getColumnDefs().isEmpty()) {
 : throw new AnalysisException(
 : "An Avro table requires column definitions or an 
Avro schema.");
 :   }
 :   AvroSchemaUtils.setFromSerdeComment(getColumnDefs());
 : }
 :   }
what if we were to move everything Kudu related here into analyzeKuduFormat() 
which first checks if it is Kudu. If it is not, it can make sure it _doesn't_ 
have the distribute or PK params, then perform the storage handler check, then 
return. If it is a Kudu table it'd mostly be the same as it is. Basically we 
could just always call the fn and it handles Kudu or no-Kudu checks 
appropriately. Then we'd get rid of some of the Kudu mess here.


PS3, Line 279: throw new AnalysisException(String.format("Column '%s' 
(type: '%s') cannot " +
 : "be used as a primary key. Keys must be integer or 
string types.",
 : pkColDef.getColName(), pkColDef.getType().toSql()));
I don't see a test case for this error


PS3, Line 277:   ColumnDef pkColDef = getPrimaryKeyColumnDefs().get(i);
 :   if (!KuduUtil.isSupportedKeyType(pkColDef.getType())) {
 : throw new AnalysisException(String.format("Column '%s' 
(type: '%s') cannot " +
 : "be used as a primary key. Keys must be integer or 
string types.",
 : pkColDef.getColName(), pkColDef.getType().toSql()));
 :   }
 :   if (!pkColDef.equals(getColumnDefs().get(i))) {

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-10-05 Thread Matthew Jacobs (Code Review)

Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 3:

(11 comments)

Not even close to done but want to start giving you feedback in pieces so we 
can parallelize a bit.

http://gerrit.cloudera.org:8080/#/c/4414/3/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

PS3, Line 344: // Enum listing all possible DISTRIBUTE BY types
 : enum TDistributeType {
 :   HASH,
 :   RANGE,
I don't see this used


PS3, Line 350: // Parameters needed for hash distribution
 : struct TDistributeByHashParam {
 :   1: required list columns
 :   2: required i32 num_buckets
 : }
 : 
 : struct TRangeLiteral {
 :   1: optional i64 int_literal
 :   2: optional string string_literal
 : }
 : 
 : struct TRangeLiteralList {
 :   1: required list values
 : }
 : 
 : // A range distribution is identified by a list of columns and a 
series of split rows.
 : struct TDistributeByRangeParam {
 :   1: required list columns
 :   2: optional list split_rows;
 : }
 : 
 : // Parameters for the DISTRIBUTE BY clause. The actual 
distribution is identified by
 : // the type parameter.
 : struct TDistributeParam {
 :   // Set if type is set to HASH
 :   1: optional TDistributeByHashParam by_hash_param;
 : 
 :   // Set if type is set to RANGE
 :   2: optional TDistributeByRangeParam by_range_param;
 : }
I see that these are used in the serialized catalog objects, but given that 
none of this is stored in the metastore nor do we require it to be stored in 
the catalog, I don't see why we bother keeping all this in the thrift table 
object.

Can we just remove it and not serialize this info? Hopefully we can even split 
the distribute/partition parameters out of CREATE TABLE eventually, anyway.


PS3, Line 390: 
 :   // Distribution schemes
 :   4: required list distribute_by
same for this, can we get rid of it?


http://gerrit.cloudera.org:8080/#/c/4414/3/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

PS3, Line 398: CatalogObjects.TDistributeParam
wrt my comments in CatalogObjects, I guess we'd need them here, but I don't see 
why we bother storing them in the serialized catalog table object.


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/ColumnDefOptions.java
File fe/src/main/java/org/apache/impala/analysis/ColumnDefOptions.java:

PS3, Line 21:  * Placeholder for the column definition options of a CREATE 
TABLE statement.
:  * Contains the list of column definitions and, optionally, the 
list of column names
:  * specified using the PRIMARY KEY keyword.
this feels like a weird abstraction... do we really need this class?


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
File fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java:

PS3, Line 24: class TableDataLayout {
: 
:   private final List partitionColDefs_;
:   private final List distributeParams_;
Not that we're doing this in this review, but we need to think about how this 
might change when we do the next steps for Kudu partitioning.


PS3, Line 49:   List getPartitionColumnDefs() { return 
partitionColDefs_; }
:   List getDistributeParams() { return 
distributeParams_; }
any reason these aren't public?


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

PS3, Line 1: // Copyright 2016 Cloudera Inc.
   : //
   : // Licensed under the Apache License, Version 2.0 (the "License");
   : // you may not use this file except in compliance with the License.
   : // You may obtain a copy of the License at
   : //
   : // http://www.apache.org/licenses/LICENSE-2.0
   : //
   : // Unless required by applicable law or agreed to in writing, 
software
   : // distributed under the License is distributed on an "AS IS" 
BASIS,
   : // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
   : // See the License for the specific language governing permissions 
and
   : // limitations under the License.
this and maybe others need the new license header:

// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements.  See the NOTICE file
// distributed

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-09-30 Thread Dimitris Tsirogiannis (Code Review)

Dimitris Tsirogiannis has uploaded a new patch set (#3).

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..

IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

With this commit we simplify the syntax and handling of CREATE TABLE
statements for both managed and external Kudu tables.

Syntax example:
CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b))
DISTRIBUTE BY HASH (a) INTO 3 BUCKETS,
RANGE (b) SPLIT ROWS (('abc', 'def'))
STORED AS KUDU

Changes:
1) Remove the requirement to specify table properties such as key
   columns in tblproperties.
2) Read table schema (column definitions, primary keys, and distribution
   schemes) from Kudu instead of the HMS.
3) For external tables, the Kudu table is now required to exist at the
   time of creation in Impala.
4) Disallow table properties that could conflict with an existing
   table. Ex: key_columns cannot be specified.
5) Add KUDU as a file format.
6) Add a startup flag to impalad to specify the default Kudu master
   addresses. The flag is used as the default value for the table
   property kudu_master_addresses but it can still be overriden
   using TBLPROPERTIES.
7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE
   wasn't implemented for Kudu tables and silently ignored. The Kudu
   tables wouldn't be removed in Kudu.
8) Remove DDL delegates. There was only one functional delegate (for
   Kudu) the existence of the other delegate and the use of delegates in
   general has led to confusion. The Kudu delegate only exists to provide
   functionality missing from Hive.
9) Add PRIMARY KEY at the column and table level. This syntax is fairly
   standard. When used at the column level, only one column can be
   marked as a key. When used at the table level, multiple columns can
   be used as a key. Only Kudu tables are allowed to use PRIMARY KEY.
   The old "kudu.key_columns" table property is no longer accepted
   though it is still used internally. "PRIMARY" is now a keyword.
   "KEY" is expected to be common enough that the ident style
   declaration is used instead to avoid conflicts.
10) For managed tables, infer a Kudu table name if none was given.
   The table property "kudu.table_name" is optional for managed tables
   and is required for external tables. If for a managed table a Kudu
   table name is not provided, a table name will be generated based
   on the HMS database and table name. If the database is "default",
   then a Kudu table name will not include the database.
11) Use Kudu master as the source of truth for table metadata instead
   of HMS when a table is loaded or refreshed. Table/column metadata
   are cached in the catalog and are stored in HMS in order to be
   able to use table and column statistics.

Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
---
M be/src/service/frontend.cc
M bin/start-catalogd.sh
M bin/start-impala-cluster.py
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
A fe/src/main/java/org/apache/impala/analysis/ColumnDefOptions.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DistributeParam.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
A fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
A fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
D fe/src/main/java/org/apache/impala/catalog/delegates/DdlDelegate.java
D fe/src/main/java/org/apache/impala/catalog/delegates/KuduDdlDelegate.java
D 
fe/src/main/java/org/apache/impala/catalog/delegates/UnsupportedOpDelegate.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionFilter.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
A fe/src/main/java/org/apache/impala/service/KuduCatalogO

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-09-27 Thread Dimitris Tsirogiannis (Code Review)

Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 1:

(94 comments)

http://gerrit.cloudera.org:8080/#/c/4414/1//COMMIT_MSG
Commit Message:

PS1, Line 21: When attempting to create an external
:   table "foo" in database "bar", Impala will search for a Kudu 
table
:   name "foo.bar" and "bar" (Kudu doesn't have database name spaces
:   yet.)
> I think this sentence is duplicated by #10. Maybe remove this part. This it
Done


PS1, Line 25:  The Kudu table is now required to exist at the time of creation 
in
:   Impala.
> for external tables? or if this is more general can you explain a bit more?
Done


PS1, Line 31: default value
> can it still be overridden?
Done


PS1, Line 39: Eventually Hive should have the needed
:   functionality and the Kudu delegate (renamed in this patch to 
KuduCatalogOpExecutor)
:   can be removed.
> maybe, it's not clear if impala would want to keep the ability to do this o
Done


PS1, Line 54:  If the database is "default" then
:   the Kudu table name will not include the database.
> hm... I wonder if it'd be easier just to keep it standardized and always ha
Done


PS1, Line 56: 11) Several improvements in the grammar related to the family
:   of CREATE TABLE statements.
> I don't think this bullet adds much value to the commit message. Everything
Done


PS1, Line 58: 12) Added new tests and modified existing Kudu test to use the new
:   CREATE TABLE syntax.
> I think this can be removed too, it should be implied there are tests and i
Done


PS1, Line 60: insteads
> instead
Done


PS1, Line 60: source of truth
> Does this mean we always hit Kudu for metadata? In the Catalog? In the impa
Done


PS1, Line 64: Not included in this commit:
: - Additional column properties such as nullability and compression
: encodings.
> I don't think you need to include this, we have a separate JIRA for this.
Done


http://gerrit.cloudera.org:8080/#/c/4414/1/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

PS1, Line 47: {"", "(ZILjava/lang/String;IIZLjava/lang/String;)V",
:   &catalog_ctor_},
> 1 line
Done


http://gerrit.cloudera.org:8080/#/c/4414/1/be/src/service/frontend.cc
File be/src/service/frontend.cc:

PS1, Line 37: // XXX: This flag doesn't seem to be used anywhere. Maybe remove 
it?
: DEFINE_bool(load_catalog_at_startup, false, "if true, load all 
catalog data at startup");
> agreed. doesn't look like it's used by JniFrontend. Let's remove it.
Done


PS1, Line 63: IPs
> IP addresses.
Done


PS1, Line 63:  
> no space
Done


http://gerrit.cloudera.org:8080/#/c/4414/1/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

PS1, Line 52: THdfsFileFormat
> Maybe not for this patch since it's already huge, but it'd be great to gene
I agree we need to revisit this. Left a TODO for now.


PS1, Line 61: THdfsCompression
> similarly, this seems unnecessarily specific to Hdfs. Not necessarily somet
Left TODO.


http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

> FYI I'm not looking at this very carefully because it was reviewed previous
Only few things changed mostly wrt specifying the PRIMARY KEY keyword  inside 
the column definition list.


PS1, Line 31: ColumnDefOptions
> this class doesn't exist, please add the missing file
Done


PS1, Line 405:   view_column_def_list, view_column_defs;
> separate nonterminals if they dont fit?
Done


PS1, Line 458: key_ident;
> does this not need Boolean like those below?
Done


http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java:

> I'm not in love with the TableDef and TableDefOptions stuff because it feel
As we spoke offline, the initial attempt to refactor this logic didn't succeed. 
I will not change these classes but I'll make an effort to separate the 
Kudu-specific logic at the function level.


PS1, Line 240: 
getColumnDefs().add(KuduCatalogOpExecutor.REQUIRED_THRIFT_COLUMN_DEF);
> is this needed? I thought there was going to be code that fetches the colum
We do, but that happens in the catalog. This was added to satisfy the thrift 
spec.


PS1, Line 242:   if 
(!getTblProperties().containsKey(KuduTable.KEY_TABLE_NAME)) {
 : getTblProperties().put(KuduTable.KEY_TABLE_NAME,
 : KuduUtil.getDefaultCreateKuduTableName(getDb(), 
getTbl()));
 :   }
> hm... can we simplify this and only allow internal tables to be created wit
This is executed only for managed tables. Is this what you meant? Also, I 
changed the prefix to be __IMPALA__db.table;

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-09-27 Thread Dimitris Tsirogiannis (Code Review)

Dimitris Tsirogiannis has uploaded a new patch set (#2).

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..

IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

With this commit we simplify the syntax and handling of CREATE TABLE
statements for both managed and external Kudu tables.

Syntax example:
CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b))
DISTRIBUTE BY HASH (a) INTO 3 BUCKETS,
RANGE (b) SPLIT ROWS (('abc', 'def'))
STORED AS KUDU

Changes:
1) Remove the requirement to specify table properties such as key
   columns in tblproperties.
2) Read table schema (column definitions, primary keys, and distribution
   schemes) from Kudu instead of the HMS.
3) For external tables, the Kudu table is now required to exist at the
   time of creation in Impala.
4) Disallow table properties that could conflict with an existing
   table. Ex: key_columns cannot be specified.
5) Add KUDU as a file format.
6) Add a startup flag to impalad to specify the default Kudu master
   addresses. The flag is used as the default value for the table
   property kudu_master_addresses but it can still be overriden
   using TBLPROPERTIES.
7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE
   wasn't implemented for Kudu tables and silently ignored. The Kudu
   tables wouldn't be removed in Kudu.
8) Remove DDL delegates. There was only one functional delegate (for
   Kudu) the existence of the other delegate and the use of delegates in
   general has led to confusion. The Kudu delegate only exists to provide
   functionality missing from Hive.
9) Add PRIMARY KEY at the column and table level. This syntax is fairly
   standard. When used at the column level, only one column can be
   marked as a key. When used at the table level, multiple columns can
   be used as a key. Only Kudu tables are allowed to use PRIMARY KEY.
   The old "kudu.key_columns" table property is no longer accepted
   though it is still used internally. "PRIMARY" is now a keyword.
   "KEY" is expected to be common enough that the ident style
   declaration is used instead to avoid conflicts.
10) For managed tables, infer a Kudu table name if none was given.
   The table property "kudu.table_name" is optional for managed tables
   and is required for external tables. If for a managed table a Kudu
   table name is not provided, a table name will be generated based
   on the HMS database and table name. If the database is "default",
   then a Kudu table name will not include the database.
11) Use Kudu master as the source of truth for table metadata instead
   of HMS when a table is loaded or refreshed. Table/column metadata
   are cached in the catalog and are stored in HMS in order to be
   able to use table and column statistics.

Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
---
M be/src/service/frontend.cc
M bin/start-catalogd.sh
M bin/start-impala-cluster.py
M common/thrift/CatalogObjects.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/com/cloudera/impala/analysis/AnalysisUtils.java
M fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java
A fe/src/main/java/com/cloudera/impala/analysis/ColumnDefOptions.java
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableDataSrcStmt.java
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java
M fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java
M fe/src/main/java/com/cloudera/impala/analysis/ModifyStmt.java
A fe/src/main/java/com/cloudera/impala/analysis/TableDataLayout.java
A fe/src/main/java/com/cloudera/impala/analysis/TableDef.java
M fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java
M fe/src/main/java/com/cloudera/impala/catalog/Catalog.java
M fe/src/main/java/com/cloudera/impala/catalog/Db.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/com/cloudera/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java
M fe/src/main/java/com/cloudera/impala/catalog/Table.java
M fe/src/main/java/com/cloudera/impala/catalog/TableLoader.java
D fe/src/main/java/com/cloudera/impala/catalog/delegates/DdlDelegate.java
D fe/src/main/java/com/cloudera/impala/catalog/delegates/KuduDdlDelegate.java
D 
fe/src/main/java/com/cloudera/impala/catalog/delegates/UnsupportedOpDelegate.java
M fe/src/main/java/com/cloudera/impala/planner/HdfsPartitionFilter.java
M fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
M fe/src/main/java/com/cloudera/impala/service/Frontend.java
M fe/src/main/java/com/cloudera/impala/service/JniCatalog.java
M fe/src/main/java/com/cloudera/impala/service/JniFrontend.java
A

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-09-21 Thread Matthew Jacobs (Code Review)

Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 1:

(3 comments)

sorry, more comments

I keep finding new things as I'm adding some new test cases myself.

http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java
File fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java:

PS1, Line 87: 
:   // Key to specify the number of tablet replicas.
:   // TODO(KUDU): Allow modification in alter table.
:   public static final String KEY_TABLET_REPLICAS = 
"kudu.num_tablet_replicas";
> I don't think we need to store this, even if we wanted to modify this later
After looking a bit more, I see that this is the way the user specifies the # 
of replicas and there is no other way. I guess it's weird to strip out table 
properties added by the user, so I suppose we should keep it. Instead, it'd be 
worth us considering separating out options that need to be persisted (i.e. we 
need the mapping to kudu tbl name) vs things that just get passed through and 
we shouldn't get in the business of trying to replicate in our own metadata. 
Probably not P1 but worth a JIRA since this is user facing and will sit in 
metadata.

Also:
1) the TODO(KUDU) that is here could be removed.
2) would be worth testing how this behaves with CREATE TABLE LIKE. I'd expect 
it to inherit this value but it'd be worth checking.


http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/test/java/com/cloudera/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/com/cloudera/impala/analysis/AnalyzeDDLTest.java:

PS1, Line 28: import com.cloudera.impala.catalog.KuduTable;
: import junit.framework.Assert;
nit: organize imports


http://gerrit.cloudera.org:8080/#/c/4414/1/testdata/workloads/functional-query/queries/QueryTest/create_kudu.test
File testdata/workloads/functional-query/queries/QueryTest/create_kudu.test:

> we might still need to have explicit create table tests.
Also, I don't see any tests for CTAS here or anywhere.

Can you add a few positive (e.g. from kudu_functional.alltypes) and negative 
tests (e.g. involving datatypes that kudu doesn't support, e.g. 
functional.alltypes and functional.decimaltbl ).


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-09-21 Thread Matthew Jacobs (Code Review)

Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 1:

(9 comments)

... a few more things

http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java
File fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java:

PS1, Line 223: Preconditions.checkNotNull(kuduDistributeByParams);
This breaks for tables that were created before this change.


http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java
File fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java:

PS1, Line 87: 
:   // Key to specify the number of tablet replicas.
:   // TODO(KUDU): Allow modification in alter table.
:   public static final String KEY_TABLET_REPLICAS = 
"kudu.num_tablet_replicas";
I don't think we need to store this, even if we wanted to modify this later. 
Further, it doesn't look like the API exposes a way to do that anyway.

Can you add a TODO or remove this?


PS1, Line 92:   public static final String KEY_DISTRIBUTE_BY = 
"kudu.distribute_by";
I'm not sure we should be storing this. The Kudu client exposes 
KuduTable.getPartitionSchema(). It looks like it's just for use by SHOW CREATE 
TABLE, and it'd be good for us to use the PartitionSchema from them which we 
should probably be able to stringify.

I'm OK with leaving that for later as long as you think it won't be an issue 
for dev environments to upgrade between state now -> this review as-is -> 
making the change i suggested


http://gerrit.cloudera.org:8080/#/c/4414/1/testdata/workloads/functional-query/queries/QueryTest/create_kudu.test
File testdata/workloads/functional-query/queries/QueryTest/create_kudu.test:

we might still need to have explicit create table tests.

Specifically we should test that you can create an internal table with a 
different kudu table name by overriding 'kudu.table_name'. We could basically 
modify the managed_kudu table test case that's here to do so.

I think we need to find a way to modify the kudu schema (e.g. drop and add 
columns) and see if we handle gracefully. Seems OK to leave that as a future 
task, but we should at least know how we handle those cases.


http://gerrit.cloudera.org:8080/#/c/4414/1/testdata/workloads/functional-query/queries/QueryTest/kudu-show-create.test
File 
testdata/workloads/functional-query/queries/QueryTest/kudu-show-create.test:

well, we should have some coverage of show create table...


http://gerrit.cloudera.org:8080/#/c/4414/1/testdata/workloads/functional-query/queries/QueryTest/kudu_partition_ddl.test
File 
testdata/workloads/functional-query/queries/QueryTest/kudu_partition_ddl.test:

PS1, Line 15: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
I think this kind of error should still be covered. You can still mismatch PK 
cols and dist by columns.


PS1, Line 37: 
: 
we should still cover this as well.


PS1, Line 50: 
: 
same


Some more questions about losing coverage. Fine if this is now covered by FE 
tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-09-19 Thread Matthew Jacobs (Code Review)

Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4414/1/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

PS1, Line 196: primay
primary


http://gerrit.cloudera.org:8080/#/c/4414/1/tests/common/__init__.py
File tests/common/__init__.py:

does this need to be its own file?


http://gerrit.cloudera.org:8080/#/c/4414/1/tests/common/kudu_test_suite.py
File tests/common/kudu_test_suite.py:

PS1, Line 54:   @classmethod
:   def get_db_name(cls):
: # When py.test runs with the xdist plugin, several processes 
are started and each
: # process runs some partition of the tests. It's possible 
that multiple processes
: # will call this method. A random value is generated so the 
processes won't try
: # to use the same database at the same time. The value is 
cached so within a single
: # process the same database name is always used for the 
class. This doesn't need to
: # be thread-safe since multi-threading is never used.
: if not cls.__DB_NAME:
:   cls.__DB_NAME = \
:   choice(ascii_lowercase) + 
"".join(sample(ascii_lowercase + digits, 5))
: return cls.__DB_NAME
I've always disliked this function... I think we should try to use the db 
fixture that Michael Brown added. Maybe I'm missing something though. It'd be 
good to get Michael's input here.


PS1, Line 127: mapping = {BOOL: "BOOLEAN",
 : DOUBLE: "DOUBLE",
 : FLOAT: "FLOAT",
 : INT16: "SMALLINT",
 : INT32: "INT",
 : INT64: "BIGINT",
 : INT8: "TINYINT",
 : STRING: "STRING"}
this gets defined as a local every function invocation, right? how about 
creating the map outside the fn so its static.


http://gerrit.cloudera.org:8080/#/c/4414/1/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

PS1, Line 25: TestRedaction
?


PS1, Line 42: temp_kudu_table
I'd think we could have the test fn take the database fixture and then pass it 
to temp_kudu_table


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-09-16 Thread Matthew Jacobs (Code Review)

Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 1:

(31 comments)

Ok, part 3. I have one more batch left...

http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java
File fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java:

PS1, Line 222: 
 : 
 : 
 : 
 : 
 : 
 : 
was this moved somewhere else? Maybe I'll find it as I keep reading...


PS1, Line 238: // Update the HMS
 : if (reuseMetadata) {
 :   try {
 : client.alter_table(msTable_.getDbName(), 
msTable_.getTableName(), msTable_);
 :   } catch (TException e) {
 : throw new TableLoadingException(e.getMessage());
 :   }
 : }
Same deal, I'm not sure why reuseMetadata means update the HMS.


http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java:

PS1, Line 220: 
 :   private final static KuduCatalogOpExecutor kuduExecutor_
 :   = new KuduCatalogOpExecutor();
all static methods, why instantiate it?


PS1, Line 1122:  // The db == null case isn't handled. The only tables this 
should matter for are
  : // Kudu tables. The expectation is Impala will always know 
about any Kudu tables
  : // because Hive doesn't support Kudu yet. 
I've always had a problem with this comment :/ even after trimming it down I 
don't think it makes complete sense. This should mention briefly what this code 
_is handling_ before it talks about what it's not handling. Can you add 1 
sentence at the beginning?


PS1, Line 1124: When Hive supports Kudu the DDL delegates
  : // can be removed. 
https://issues.cloudera.org/browse/IMPALA-3424 tracks the removal.
Can you remove these sentences and just leave IMPALA-3424 as a reference? We 
may not remove our kudu code even if/when Hive supports Kudu.


PS1, Line 1127: ,
nit: extra comma


PS1, Line 1150: // The operation will be aborted if the Kudu table 
cannot be dropped. If for
  : // some reason Kudu is permanently stuck in a 
non-functional state, the user is
  : // expected to ALTER TABLE to either set the table to 
UNMANAGED or set the format
  : // to something else.
JIRA? Even if we don't fix it we should have something public to point to 
regarding this behavior. Also to make sure the docs team knows to document it.


PS1, Line 1205: kuduExecutor_
static ref


PS1, Line 1406: if (KuduTable.isKuduTable(tbl)) {
  :   return createKuduTable(tbl, params.if_not_exists, 
params.getDistribute_by(),
  :   response);
  : }
  : return createTable(tbl, params.if_not_exists, 
params.getCache_op(), response);
  :   }
it's too bad we have to just branch like this and handle everything 
differently, but I can't think of anything better.


PS1, Line 1420:  createMetaStoreTable
I'd prefer to have this on the prev line and the parameter on the new line


Line 1475:* creation of a managed Kudu table.
comment on response param


PS1, Line 1480: TDdlExecResponse response)
I think this just fits on the prev line


PS1, Line 1483: kuduExecutor_
static reference KuduCatalogOpExecutor, here elsewhere in this fn


PS1, Line 1509:   // Add the table to the catalog cache
  :   Table newTbl = catalog_.addTable(newTable.getDbName(), 
newTable.getTableName());
  :   addTableToCatalogUpdate(newTbl, response.result);
While I don't think these will throw, it might be worth wrapping all the logic 
after the Kudu create table in a try { } catch block that drops the kudu table. 
That'll future proof this a bit. Feel free to ignore though.


http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/service/Frontend.java
File fe/src/main/java/com/cloudera/impala/service/Frontend.java:

PS1, Line 230: impaladCatalog_.getDefaultKuduMasterAddrs()
this is fine but a little weird, normally I'm all about removing extra state 
but in this case it might be better to keep this as a parameter on frontend and 
pass it in. Unless there's somewhere else besides the old impaladCatalog we can 
get it?


http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/service/JniCatalog.java
File fe/src/main/java/com/cloudera/impala/service/JniCatalog.java:

PS1, Line 83: 
:   int otherLogLevel, boolean allowAuthToLocal, String 
kerberosPrincipal)
wrapping


http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/servi

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-09-16 Thread Matthew Jacobs (Code Review)

Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 1:

(17 comments)

another batch of comments, still a lot of files i haven't touched yet...

http://gerrit.cloudera.org:8080/#/c/4414/1/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

PS1, Line 47: {"", "(ZILjava/lang/String;IIZLjava/lang/String;)V",
:   &catalog_ctor_},
1 line


http://gerrit.cloudera.org:8080/#/c/4414/1/be/src/service/frontend.cc
File be/src/service/frontend.cc:

PS1, Line 37: // XXX: This flag doesn't seem to be used anywhere. Maybe remove 
it?
: DEFINE_bool(load_catalog_at_startup, false, "if true, load all 
catalog data at startup");
agreed. doesn't look like it's used by JniFrontend. Let's remove it.


PS1, Line 63: IPs
IP addresses.


PS1, Line 63:  
no space


http://gerrit.cloudera.org:8080/#/c/4414/1/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

PS1, Line 52: THdfsFileFormat
Maybe not for this patch since it's already huge, but it'd be great to 
generalize this if we can. I can think of two improvements:
1) Maybe we should model the storage layer, e.g. have a TStorageEngine, then 
make this TFileFormat (perhaps). This is probably a big change.
2) Rename this to be TStorageFormat, which kind of addresses #1 but doesn't 
separate out storage engines and file formats.


PS1, Line 61: THdfsCompression
similarly, this seems unnecessarily specific to Hdfs. Not necessarily something 
to change now but maybe we can create a follow-up JIRA to clean this up.


http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

PS1, Line 31: ColumnDefOptions
this class doesn't exist, please add the missing file


http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java
File fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java:

PS1, Line 84: BigDecimal
why BigDecimal? Ultimately this has to resolve to some int for kudu's API. (We 
can check if it's 32 or 64bit).


PS1, Line 110: <= 1)
is 1 bucket actually not valid?


PS1, Line 138: colType.isStringType() && !exprType.isStringType()
 :   || colType.isIntegerType() && 
(!exprType.isIntegerType()
 :   || exprType.getPrecision() > 
colType.getPrecision())
1. I don't see anything in the Kudu client that explicitly says you can't 
partition on any particular types. This code will exclude boolean and floating 
pt types, which is maybe unnecessary. 
2. Esp if we can address #1, is there a cleaner way to make sure the types are 
valid rather than enumerating the kinds of types to consider? I'm not sure, but 
maybe one of the frontend gurus can think of something.


http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/catalog/HdfsFileFormat.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsFileFormat.java:

PS1, Line 39: HdfsFileFormat
Can you open a JIRA (and leave it in the comment) to refactor this later?


http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java
File fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java:

PS1, Line 79: 
:   // TODO we should have something like 
KuduConfig.getDefaultConfig()
do you know what this means / can it be removed now that we're adding default 
master addrs?


Line 92:   public static final String KEY_DISTRIBUTE_BY = "kudu.distribute_by";
comment


Line 153:* Load the columns from the schema list
can you add a comment about error handling in this function?


PS1, Line 158:   LOG.error(String.format("Kudu tables must have at least 
one"
 :   + "key column (had %d), and no more key columns than 
there are table columns "
 :   + "(had %d).", keyColumns.size(), schema.size()));
shouldn't this still fail? if not, can you add a comment why this continues?


PS1, Line 184:   LOG.error(String.format("Some key columns were not found 
in"
 :   + " the set of columns. List of column names: %s, 
List of key column names:"
 :   + " %s", Iterables.toString(columnNames), 
Iterables.toString(keyColumns)));
why do we continue?


PS1, Line 199: // Get the table metadata from Kudu
 : if (reuseMetadata) {
I'm confused about this. It's not clear to me from the name 'reuseMetadata' why 
this means we should populate the metadata from Kudu. If anything, it sounds 
like it would be the opposite.

The base class comment just says "If 'reuseMetadata' is true, reuse valid 
existing metadata.".


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-09-14 Thread Matthew Jacobs (Code Review)

Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..

Patch Set 1:

(28 comments)

Ok, here's a bit to get started. Just made my way through most of analysis.
Still have a lot to get through but figured I'll give you the feedback I have
while I do a few other things.

http://gerrit.cloudera.org:8080/#/c/4414/1//COMMIT_MSG
Commit Message:

PS1, Line 21: When attempting to create an external
: table "foo" in database "bar", Impala will search for a Kudu
table
: name "foo.bar" and "bar" (Kudu doesn't have database name spaces
: yet.)
I think this sentence is duplicated by #10. Maybe remove this part. This item
is about reading the columns, right? The issue of table name resolution is
separate IMO, and covered by #10, though consider moving them next together :)

PS1, Line 25: The Kudu table is now required to exist at the time of creation
in
: Impala.
for external tables? or if this is more general can you explain a bit more?

PS1, Line 31: default value
can it still be overridden?

PS1, Line 39: Eventually Hive should have the needed
: functionality and the Kudu delegate (renamed in this patch to
KuduCatalogOpExecutor)
: can be removed.
maybe, it's not clear if impala would want to keep the ability to do this on
its own... Remove?

PS1, Line 54: If the database is "default" then
: the Kudu table name will not include the database.
hm... I wonder if it'd be easier just to keep it standardized and always have
the name, i.e. using default in this case. What do you think? I'll see if I
change my mind after reading more code.

PS1, Line 56: 11) Several improvements in the grammar related to the family
: of CREATE TABLE statements.
I don't think this bullet adds much value to the commit message. Everything
else is more user visible, this is just some necessary code cleanup.

PS1, Line 58: 12) Added new tests and modified existing Kudu test to use the new
: CREATE TABLE syntax.
I think this can be removed too, it should be implied there are tests and it
takes away from the billion other "features" in this patch :)

PS1, Line 60: source of truth
Does this mean we always hit Kudu for metadata? In the Catalog? In the
impalad's Catalog?

PS1, Line 60: insteads
instead

PS1, Line 64: Not included in this commit:
: - Additional column properties such as nullability and compression
: encodings.
I don't think you need to include this, we have a separate JIRA for this.

http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

FYI I'm not looking at this very carefully because it was reviewed previously
https://gerrit.cloudera.org/#/c/2865/

Let me know if this has changed much since then.

PS1, Line 405: view_column_def_list, view_column_defs;
separate nonterminals if they dont fit?

PS1, Line 458: key_ident;
does this not need Boolean like those below?

http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java:

I'm not in love with the TableDef and TableDefOptions stuff because it feels
like that's parser implementation concerns bleeding into analysis code.

Do you think we need both? Can we fold the latter into TableDef?

If we have to keep both, it'd be helpful to at least group the functions in
this file based on their source (e.g. TableDef, TDOptions, or on this itself)
with comments around the groups? Maybe within those groups they could also be
sorted by visibility? There's a lot of stuff going on so I think it'd be
helpful to keep it tidy.

PS1, Line 240:
getColumnDefs().add(KuduCatalogOpExecutor.REQUIRED_THRIFT_COLUMN_DEF);
is this needed? I thought there was going to be code that fetches the columns
from Kudu and stores them.

PS1, Line 242: if
(!getTblProperties().containsKey(KuduTable.KEY_TABLE_NAME)) {
: getTblProperties().put(KuduTable.KEY_TABLE_NAME,
: KuduUtil.getDefaultCreateKuduTableName(getDb(),
getTbl()));
: }
hm... can we simplify this and only allow internal tables to be created with
the name db_name.table_name? Obviously there could be conflicts and it should
fail then. Then it'd be 1 less case to consider.

Also, I wonder if we should prefix the underlying kudu table names with
something, e.g. IMPALA_db.table?

PS1, Line 275: // Kudu's data distribution rules are enforced here. A
good reference is
: //
http://getkudu.io/docs/schema_design.html#data-distribution. As a summary:
: // * Only primary key column can be used in distribution
definitions
:

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-09-14 Thread Dimitris Tsirogiannis (Code Review)

Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..


Patch Set 1:

There are definitely some rough edges but I am posting the CR so that we can 
iterate through it sooner than later. I'll keep refining it but all comments 
are welcomed.

-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

2016-09-14 Thread Dimitris Tsirogiannis (Code Review)

Dimitris Tsirogiannis has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4414

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
..

IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables

With this commit we simplify the syntax and handling of CREATE TABLE
statements for both managed and external Kudu tables.

Syntax example:
CREATE TABLE foo(a INT, b STRING, PRIMARY KEY (a, b))
DISTRIBUTE BY HASH (a) INTO 3 BUCKETS,
RANGE (b) SPLIT ROWS (('abc', 'def'))
STORED AS KUDU

Changes:
1) Remove the requirement to specify table properties such as key
  columns in tblproperties.
2) Read table schema from Kudu. When attempting to create an external
  table "foo" in database "bar", Impala will search for a Kudu table
  name "foo.bar" and "bar" (Kudu doesn't have database name spaces
  yet.)
3) The Kudu table is now required to exist at the time of creation in
  Impala.
4) Disallow table properties that could conflict with an existing
  table. Ex: key_columns cannot be specified.
5) Add KUDU as a file format.
6) Add a startup flag to impalad to specify the default Kudu master
  addresses. The flag is used as the default value for the table
  property kudu_master_addresses.
7) Fix a post merge issue (IMPALA-3178) where DROP DATABASE CASCADE
  wasn't implemented for Kudu tables and silently ignored. The Kudu
  tables wouldn't be removed in Kudu.
8) Remove DDL delegates. There was only one functional delegate (for
  Kudu) the existence of the other delegate and the use of delegates in
  general has led to confusion. The Kudu delegate only exists to provide
  functionality missing from Hive. Eventually Hive should have the needed
  functionality and the Kudu delegate (renamed in this patch to 
KuduCatalogOpExecutor)
  can be removed.
9) Add PRIMARY KEY at the column and table level. This syntax is fairly
  standard. When used at the column level, only one column can be
  marked as a key. When used at the table level, multiple columns can
  be used as a key. Only Kudu tables are allowed to use PRIMARY KEY.
  The old "kudu.key_columns" table property is no longer accepted
  though it is still used internally. "PRIMARY" is now a keyword.
  "KEY" is expected to be common enough that the ident style
  declaration is used instead to avoid conflicts.
10) Infer a Kudu table name if none was given. The table property
  "kudu.table_name" is now optional. If not given, the Kudu table name
  will be created based on the Hive Metastore database and table name.
  "CREATE TABLE foo.bar (i INT PRIMARY KEY) STORED AS KUDU" will create
  a table in Kudu named "foo.bar". If the database is "default" then
  the Kudu table name will not include the database.
11) Several improvements in the grammar related to the family
  of CREATE TABLE statements.
12) Added new tests and modified existing Kudu test to use the new
  CREATE TABLE syntax.
13) Use Kudu master as the source of truth for table metadata insteads
  of HMS. Table/column metadata are still stored in HMS in order to be
  able to use table and column statistics.

Not included in this commit:
- Additional column properties such as nullability and compression
encodings.

Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
---
M be/src/catalog/catalog.cc
M be/src/service/frontend.cc
M bin/start-catalogd.sh
M bin/start-impala-cluster.py
M common/thrift/CatalogObjects.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/com/cloudera/impala/analysis/AnalysisUtils.java
M fe/src/main/java/com/cloudera/impala/analysis/ColumnDef.java
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableDataSrcStmt.java
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java
M fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java
A fe/src/main/java/com/cloudera/impala/analysis/TableDataLayout.java
A fe/src/main/java/com/cloudera/impala/analysis/TableDef.java
A fe/src/main/java/com/cloudera/impala/analysis/TableDefOptions.java
M fe/src/main/java/com/cloudera/impala/analysis/ToSqlUtils.java
M fe/src/main/java/com/cloudera/impala/catalog/Catalog.java
M fe/src/main/java/com/cloudera/impala/catalog/Db.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/com/cloudera/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java
M fe/src/main/java/com/cloudera/impala/catalog/Table.java
D fe/src/main/java/com/cloudera/impala/catalog/delegates/DdlDelegate.java
D fe/src/main/java/com/cloudera/impala/catalog/delegates/KuduDdlDelegate.java
D 
fe/src/main/java/com/cloudera/impala/catalog/delegates/UnsupportedOpDelegate.java
M fe/src/main/java/com/cloudera/impala/plann

55 matches

Mail list logo