[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-10-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..

IMPALA-10164: Supporting HadoopCatalog for Iceberg table

This patch mainly realizes creating Iceberg table by HadoopCatalog.
We only supported HadoopTables api before this patch, but now we can
use HadoopCatalog to create Iceberg table. When creating managed table,
we can use SQL like this:
  CREATE TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
  )
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test');
We supported two values ('hadoop.catalog', 'hadoop.tables') for
'iceberg.catalog' now. If you don't specify this property in your SQL,
default catalog type is 'hadoop.catalog'.
As for external Iceberg table, you can use SQL like this:
  CREATE EXTERNAL TABLE default.iceberg_test_external
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test',
'iceberg.table_identifier'='default.iceberg_test');
We cannot set table location for both managed and external Iceberg
table with 'hadoop.catalog', and 'SHOW CREATE TABLE' will not display
table location yet. We need to use 'DESCRIBE FORMATTED/EXTENDED' to
get this location info.
'iceberg.catalog_location' is necessary for 'hadoop.catalog' table,
which used to reserved Iceberg table metadata and data, and we use this
location to load table metadata from Iceberg.
'iceberg.table_identifier' is used for Icebreg TableIdentifier.If this
property not been specified in SQL, Impala will use database and table name
to load Iceberg table, which is 'default.iceberg_test_external' in above SQL.
This property value is splitted by '.', you can alse set this value like this:
'org.my_db.my_tbl'. And this property is valid for both managed and external
table.

Testing:
- Create table tests in functional_schema_template.sql
- Iceberg table create test in test_iceberg.py
- Iceberg table query test in test_scanners.py
- Iceberg table show create table test in test_show_create_table.py

Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Reviewed-on: http://gerrit.cloudera.org:8080/16446
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet
A 

[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-10-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 22: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 22
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 01 Oct 2020 13:54:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-10-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 22:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6495/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 22
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 01 Oct 2020 08:25:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-10-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 22: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 22
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 01 Oct 2020 08:25:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 21: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6490/


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 21
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 30 Sep 2020 19:42:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 21: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 21
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 30 Sep 2020 14:10:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-30 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 20:

Failed again with IMPALA-10143 so I uploaded a fix for that issue: 
https://gerrit.cloudera.org/#/c/16523/


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 20
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 30 Sep 2020 14:10:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 21:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6490/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 21
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 30 Sep 2020 14:10:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 20: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6488/


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 20
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 30 Sep 2020 13:45:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 20:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6488/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 20
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 30 Sep 2020 08:21:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-30 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 20: Code-Review+2

Those XFAIL failures can be omitted, not those the ones that make the build 
fail. However, there is a failing test in this job:
https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/12194/
I see test_full_acid_original_files failing but that seems to be flaky anyway 
so I'm sure it's unrelated to this change.
https://issues.apache.org/jira/browse/IMPALA-10143

Let me re-run the verify job.
Carry +2 from Zoltan.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 20
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 30 Sep 2020 08:20:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 20:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7317/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 20
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 30 Sep 2020 02:17:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-29 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 20:

gerrit-verify-dryrun failed, I got some message like that:
=== short test summary info 
XFAIL 
custom_cluster/test_alloc_fail.py::TestAllocFail::()::test_alloc_fail_update[protocol:
 beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}
| table_format: text/none]
  IMPALA-2925: the execution is not deterministic so some tests sometimes don't 
fail as expected
XFAIL 
custom_cluster/test_process_failures.py::TestProcessFailures::()::test_restart_catalogd
  reason: [NOTRUN] IMPALA-9848
XFAIL authorization/test_ranger.py::TestRanger::()::test_local_catalog_ownership
  reason: getTableIfCached() faulty behavior, known issue
=== 226 passed, 116 skipped, 3 xfailed, 6 xpassed in 5690.29 seconds ===

It this three xfailed cases caused this job failed? These three xfailed cases 
seems unrelated to Iceberg. My own core test on Jenkins based on patch  18 are 
passed: https://jenkins.impala.io/job/pre-review-test/723/


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 20
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 30 Sep 2020 02:02:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-29 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#20). ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..

IMPALA-10164: Supporting HadoopCatalog for Iceberg table

This patch mainly realizes creating Iceberg table by HadoopCatalog.
We only supported HadoopTables api before this patch, but now we can
use HadoopCatalog to create Iceberg table. When creating managed table,
we can use SQL like this:
  CREATE TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
  )
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test');
We supported two values ('hadoop.catalog', 'hadoop.tables') for
'iceberg.catalog' now. If you don't specify this property in your SQL,
default catalog type is 'hadoop.catalog'.
As for external Iceberg table, you can use SQL like this:
  CREATE EXTERNAL TABLE default.iceberg_test_external
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test',
'iceberg.table_identifier'='default.iceberg_test');
We cannot set table location for both managed and external Iceberg
table with 'hadoop.catalog', and 'SHOW CREATE TABLE' will not display
table location yet. We need to use 'DESCRIBE FORMATTED/EXTENDED' to
get this location info.
'iceberg.catalog_location' is necessary for 'hadoop.catalog' table,
which used to reserved Iceberg table metadata and data, and we use this
location to load table metadata from Iceberg.
'iceberg.table_identifier' is used for Icebreg TableIdentifier.If this
property not been specified in SQL, Impala will use database and table name
to load Iceberg table, which is 'default.iceberg_test_external' in above SQL.
This property value is splitted by '.', you can alse set this value like this:
'org.my_db.my_tbl'. And this property is valid for both managed and external
table.

Testing:
- Create table tests in functional_schema_template.sql
- Iceberg table create test in test_iceberg.py
- Iceberg table query test in test_scanners.py
- Iceberg table show create table test in test_show_create_table.py

Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet
A 

[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 19: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6482/


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 19
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 29 Sep 2020 19:27:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 19:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6482/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 19
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 29 Sep 2020 13:59:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 19: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 19
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 29 Sep 2020 13:59:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-29 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 18: Code-Review+2

Thanks for applying the changes, LGTM!


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 18
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 29 Sep 2020 13:23:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 18:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7310/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 18
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 29 Sep 2020 12:53:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-29 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 18:

(6 comments)

Thanks for review, Zoltan. I think it's better to keep these in a patch.

http://gerrit.cloudera.org:8080/#/c/16446/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16446/15//COMMIT_MSG@29
PS15, Line 29: table_iden
> Probably 'table_identifier' would be more appropriate as it would follow Ic
Done


http://gerrit.cloudera.org:8080/#/c/16446/15/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

http://gerrit.cloudera.org:8080/#/c/16446/15/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@592
PS15, Line 592: handler != null && !IcebergTable.isIcebergStorageHa
> Are we sure that we want to disallow this?
Done


http://gerrit.cloudera.org:8080/#/c/16446/15/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/16446/15/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2582
PS15, Line 2582: if (params.if_not_exists &&
   : catalog_.containsTable(tableName.getDb(), 
tableName.getTbl()))
> Instead of figuring out the location, we should get it from Iceberg:
Done


http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/datasets/functional/functional_schema_template.sql@2969
PS15, Line 2969:
> nit: please start new line
Done


http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test:

http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test@154
PS15, Line 154:
> nit: please try to keep lines less than 90 chars in the QUERY section.
Done


http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
File 
testdata/workloads/functional-query/queries/QueryTest/show-create-table.test:

http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test@585
PS15, Line 585:
> nit: please try to keep lines less than 90 chars.
Done



--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 18
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 29 Sep 2020 12:34:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-29 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#18). ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..

IMPALA-10164: Supporting HadoopCatalog for Iceberg table

This patch mainly realizes creating Iceberg table by HadoopCatalog.
We only supported HadoopTables api before this patch, but now we can
use HadoopCatalog to create Iceberg table. When creating managed table,
we can use SQL like this:
  CREATE TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
  )
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test');
We supported two values ('hadoop.catalog', 'hadoop.tables') for
'iceberg.catalog' now. If you don't specify this property in your SQL,
default catalog type is 'hadoop.catalog'.
As for external Iceberg table, you can use SQL like this:
  CREATE EXTERNAL TABLE default.iceberg_test_external
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test',
'iceberg.table_identifier'='default.iceberg_test');
We cannot set table location for both managed and external Iceberg
table with 'hadoop.catalog', and 'SHOW CREATE TABLE' will not display
table location yet. We need to use 'DESCRIBE FORMATTED/EXTENDED' to
get this location info.
'iceberg.catalog_location' is necessary for 'hadoop.catalog' table,
which used to reserved Iceberg table metadata and data, and we use this
location to load table metadata from Iceberg.
'iceberg.table_identifier' is used for Icebreg TableIdentifier.If this
property not been specified in SQL, Impala will use database and table name
to load Iceberg table, which is 'default.iceberg_test_external' in above SQL.
This property value is splitted by '.', you can alse set this value like this:
'org.my_db.my_tbl'. And this property is valid for both managed and external
table.

Testing:
- Create table tests in functional_schema_template.sql
- Iceberg table create test in test_iceberg.py
- Iceberg table query test in test_scanners.py
- Iceberg table show create table test in test_show_create_table.py

Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet
A 

[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 17:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7306/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 17
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 29 Sep 2020 05:00:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 16:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7305/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 16
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 29 Sep 2020 04:56:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-28 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#17). ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..

IMPALA-10164: Supporting HadoopCatalog for Iceberg table

This patch mainly realizes creating Iceberg table by HadoopCatalog.
We only supported HadoopTables api before this patch, but now we can
use HadoopCatalog to create Iceberg table. When creating managed table,
we can use SQL like this:
  CREATE TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
  )
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test');
We supported two values ('hadoop.catalog', 'hadoop.tables') for
'iceberg.catalog' now. If you don't specify this property in your SQL,
default catalog type is 'hadoop.catalog'.
As for external Iceberg table, you can use SQL like this:
  CREATE EXTERNAL TABLE default.iceberg_test_external
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test',
'iceberg.table_identifier'='default.iceberg_test');
We cannot set table location for both managed and external Iceberg
table with 'hadoop.catalog', and 'SHOW CREATE TABLE' will not display
table location yet. We need to use 'DESCRIBE FORMATTED/EXTENDED' to
get this location info.
'iceberg.catalog_location' is necessary for 'hadoop.catalog' table,
which used to reserved Iceberg table metadata and data, and we use this
location to load table metadata from Iceberg.
'iceberg.table_identifier' is used for Icebreg TableIdentifier.If this
property not been specified in SQL, Impala will use database and table name
to load Iceberg table, which is 'default.iceberg_test_external' in above SQL.
This property value is splitted by '.', you can alse set this value like this:
'org.my_db.my_tbl'. And this property is valid for both managed and external
table.

Testing:
- Create table tests in functional_schema_template.sql
- Iceberg table create test in test_iceberg.py
- Iceberg table query test in test_scanners.py
- Iceberg table show create table test in test_show_create_table.py

Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet
A 

[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16446/16/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/16446/16/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@109
PS16, Line 109: return getIcebergTableMetadata(table.getIcebergCatalog(), 
getIcebergTableIdentifier(table),
line too long (95 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 16
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 29 Sep 2020 04:36:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-28 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#16). ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..

IMPALA-10164: Supporting HadoopCatalog for Iceberg table

This patch mainly realizes creating Iceberg table by HadoopCatalog.
We only supported HadoopTables api before this patch, but now we can
use HadoopCatalog to create Iceberg table. When creating managed table,
we can use SQL like this:
  CREATE TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
  )
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test');
We supported two values ('hadoop.catalog', 'hadoop.tables') for
'iceberg.catalog' now. If you don't specify this property in your SQL,
default catalog type is 'hadoop.catalog'.
As for external Iceberg table, you can use SQL like this:
  CREATE EXTERNAL TABLE default.iceberg_test_external
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test',
'iceberg.table_identifier'='default.iceberg_test');
We cannot set table location for both managed and external Iceberg
table with 'hadoop.catalog', and 'SHOW CREATE TABLE' will not display
table location yet. We need to use 'DESCRIBE FORMATTED/EXTENDED' to
get this location info.
'iceberg.catalog_location' is necessary for 'hadoop.catalog' table,
which used to reserved Iceberg table metadata and data, and we use this
location to load table metadata from Iceberg.
'iceberg.table_identifier' is used for Icebreg TableIdentifier.If this
property not been specified in SQL, Impala will use database and table name
to load Iceberg table, which is 'default.iceberg_test_external' in above SQL.
This property value is splitted by '.', you can alse set this value like this:
'org.my_db.my_tbl'. And this property is valid for both managed and external
table.

Testing:
- Create table tests in functional_schema_template.sql
- Iceberg table create test in test_iceberg.py
- Iceberg table query test in test_scanners.py
- Iceberg table show create table test in test_show_create_table.py

Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet
A 

[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-28 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 15: Code-Review+1

(6 comments)

Thanks for applying the changes! I think the change looks good, I only had some 
minor comments.

http://gerrit.cloudera.org:8080/#/c/16446/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16446/15//COMMIT_MSG@29
PS15, Line 29: table_name
Probably 'table_identifier' would be more appropriate as it would follow 
Iceberg's terminology.


http://gerrit.cloudera.org:8080/#/c/16446/15/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

http://gerrit.cloudera.org:8080/#/c/16446/15/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@592
PS15, Line 592: // Only external table can set 'iceberg.table_name'
Are we sure that we want to disallow this?

I'm thinking of a use case when users use Iceberg's HadoopCatalog with more 
fine grained table identifiers, e.g. 'organization.database.table_name'. I 
think Iceberg allows arbitraly deep table identifiers.

So users might also want to create some tables managed by Impala, but with the 
above schema, e.g.:

 CREATE TABLE my_hms_db.my_ice_table (i int)
 STORED AS ICEBERG
 TBLPROPERTIES(
   'iceberg.catalog' = 'hadoop.catalog',
   'iceberg.catalog_location' = '/warehouse/ice-catalog',
   'iceberg.table_identifier' = 'my_organization.my_ice_db.my_table');

So if 'table_identifier' is missing, we should use 'db_name.tbl_name' as 
Iceberg table identifier, otherwise we should use the explicit param.

Though I'm also happy with the current behavior if you don't want to deal with 
it in this patch.


http://gerrit.cloudera.org:8080/#/c/16446/15/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/16446/15/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2582
PS15, Line 2582:   
newTable.getSd().setLocation(String.format("%s/%s/%s", location,
   :   newTable.getDbName(), 
newTable.getTableName()));
Instead of figuring out the location, we should get it from Iceberg:
https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/Table.java#L94

So if Iceberg change its location-computing algorithm in the future, HMS will 
still point to the right place.


http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/datasets/functional/functional_schema_template.sql@2969
PS15, Line 2969: 'iceberg.catalog_location'
nit: please start new line


http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test:

http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test@154
PS15, Line 154: 'iceberg.catalog_location'='/$DATABASE/hadoop_catalog_test');
nit: please try to keep lines less than 90 chars in the QUERY section.


http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
File 
testdata/workloads/functional-query/queries/QueryTest/show-create-table.test:

http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test@585
PS15, Line 585: 'iceberg.catalog_location'='$$location_uri$$')
nit: please try to keep lines less than 90 chars.



--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 15
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Mon, 28 Sep 2020 12:30:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 15:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7294/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 15
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Mon, 28 Sep 2020 08:01:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-28 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 15:

> Hi WangSheng,
 >
 > Yes, it's what I meant, I only have minor corrections, answered
 > inline:
 >
 > > 1. We prohibit explicitly setting the table LOCATION when using
 > 'hadoop.catalog'
 > Yes, I think we should prohibit it and let Iceberg decide the
 > location.
 >
 > > 2. When execute 'SHOW CREATE TABLE', table location is
 > 'hdfs://test-warehouse/ice-catalog/default/ice_t'
 > If we prohibit setting table location, then we can't output it in
 > SHOW CREATE TABLE.
 >
 > > 3. When execute 'DESCRIBE FORMATTED', table location is
 > 'hdfs://test-warehouse/ice-catalog/default/ice_t', and
 > 'iceberg.catalog_location' is 'hdfs://test-warehouse/ice-catalog'
 > Yes.
 >
 > > 4. When execute 'DROP TABLE', we only delete 
 > > 'hdfs://test-warehouse/ice-catalog/default/ice_t',
 > and reserve 'hdfs://test-warehouse/ice-catalog'
 > Yes.
 >
 > > 5. 'iceberg.catalog_location' is necessary for both managed and
 > external 'hadoop.catalog' Iceberg table
 > Yes. Maybe later we could also have a default location somewhere
 > under the warehouse directory. But for now I think explicit catalog
 > locations are better.
 >
 > > 6. When loading managed table, we use 
 > > load('iceberg.catalog_location',TableIdentifier(db.tbl))
 > Yes.
 >
 > > 7. When loading external table, we use 
 > > load('iceberg.catalog_location',TableIdentifier('iceberg.table_name'))
 > Yes, maybe rename 'iceberg.table_name' to 'iceberg.table_identifier'?
 > And the value of it should be a dot-separated string of names that
 > was originally used to create the Iceberg table. Iceberg can
 > already make table identifiers from dot-separated strings:
 > https://github.com/apache/iceberg/blob/1a797cd986c9193f2e422fec295aaf25ce7e1916/api/src/main/java/org/apache/iceberg/catalog/TableIdentifier.java#L47
 >
 > >
 > > Does my understanding correct?
 > >
 > > And still some questions:
 > >
 > > 1. When creating external 'hadoop.catalog' Iceberg table, are we
 > load table by 'iceberg.catalog_location' and 'iceberg.table_name'?
 > If so, both these two properties are necessary.
 > So instead of 'iceberg.table_name' we could have 'iceberg.table_identifier',
 > just to use Iceberg terminology and semantics. Yeah, probably we
 > want it to be a required property for EXTERNAL hadoop catalog
 > tables. I just wonder if it would make sense to infer it from
 > 'db_name.tbl_name'. But let's start with it as a required property,
 > then later we'll se if inferring it makes sense and add support for
 > it in a future patch.
 >
 > > 2. Can we set the table LOCATION for 'hadoop.catalog' external
 > table? If not, how do we define table LOCATION? Using
 > 'iceberg.catalog_location'+/db/external_tbl? Or use
 > 'iceberg.catalog_location'+'iceberg.table_name'?
 > I think we should not set LOCATION. Iceberg uses catalog location +
 > table identifier to find and load tables, so I think we should
 > provide those:
 > 'iceberg.catalog_location'+'iceberg.table_identifier'
 > And if Iceberg can load the table with the given information, we
 > can set the actual table location. If the table is not found, we
 > should raise an error.

Hi Zoltan, thanks for your careful review and patient reply. I've already 
modify code according  to your advice. A Jenkins test has already been 
submitted: https://jenkins.impala.io/job/pre-review-test/721/


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 15
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Mon, 28 Sep 2020 07:45:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-28 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#15). ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..

IMPALA-10164: Supporting HadoopCatalog for Iceberg table

This patch mainly realizes creating Iceberg table by HadoopCatalog.
We only supported HadoopTables api before this patch, but now we can
use HadoopCatalog to create Iceberg table. When creating managed table,
we can use SQL like this:
  CREATE TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
  )
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test');
We supported two values ('hadoop.catalog', 'hadoop.tables') for
'iceberg.catalog' now. If you don't specify this property in your SQL,
default catalog type is 'hadoop.catalog'.
As for external Iceberg table, you can use SQL like this:
  CREATE EXTERNAL TABLE default.iceberg_test_external
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test',
'iceberg.table_name'='default.iceberg_test');
We cannot set table location for both managed and external Iceberg
table with 'hadoop.catalog', and 'SHOW CREATE TABLE' will not display
table location yet. We need to use 'DESCRIBE FORMATTED/EXTENDED' to
get this location info.
'iceberg.catalog_location' is necessary for 'hadoop.catalog' table,
which used to reserved Iceberg table metadata and data, and we use this
location to load table metadata from Iceberg.
'iceberg.table_name' is the managed Iceberg table name, just like
'kudu.table_name' when creating external Kudu table. If this property
not been specified in SQL, Impala will use database and table name to
load Iceberg table, which is 'default.iceberg_test_external' in above SQL.
This property cannot be set with managed table.

Testing:
- Create table tests in functional_schema_template.sql
- Iceberg table create test in test_iceberg.py
- Iceberg table query test in test_scanners.py
- Iceberg table show create table test in test_show_create_table.py

Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet
A 

[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7288/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 14
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Sun, 27 Sep 2020 13:50:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-27 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..

IMPALA-10164: Supporting HadoopCatalog for Iceberg table

This patch mainly realizes creating Iceberg table by HadoopCatalog.
We only supported HadoopTables api before this patch, but now we can
use HadoopCatalog to create Iceberg table. When creating managed table,
we can use SQL like this:
  CREATE TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
  )
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test');
We supported two values ('hadoop.catalog', 'hadoop.tables') for
'iceberg.catalog' now. If you don't specify this property in your SQL,
default catalog type is 'hadoop.catalog'.
As for external Iceberg table, you can use SQL like this:
  CREATE EXTERNAL TABLE default.iceberg_test_external
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test',
'iceberg.table_name'='default.iceberg_test');
We cannot set table location for both managed and external Iceberg
table with 'hadoop.catalog', and 'SHOW CREATE TABLE' will not display
table location yet. We need to use 'DESCRIBE FORMATTED/EXTENDED' to
get this location info.
'iceberg.catalog_location' is necessary for 'hadoop.catalog' table,
which used to reserved Iceberg table metadata and data, and we use this
location to load table metadata from Iceberg.
'iceberg.table_name' is the managed Iceberg table name, just like
'kudu.table_name' when creating external Kudu table. If this property
not been specified in SQL, Impala will use database and table name to
load Iceberg table, which is 'default.iceberg_test_external' in above SQL.
This property cannot be set with managed table.

Testing:
- Create table tests in functional_schema_template.sql
- Iceberg table create test in test_iceberg.py
- Iceberg table query test in test_scanners.py
- Iceberg table show create table test in test_show_create_table.py

Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet
A 

[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-25 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 13:

Hi WangSheng,

Yes, it's what I meant, I only have minor corrections, answered inline:

> 1. We prohibit explicitly setting the table LOCATION when using 
> 'hadoop.catalog'
Yes, I think we should prohibit it and let Iceberg decide the location.

> 2. When execute 'SHOW CREATE TABLE', table location is 
> 'hdfs://test-warehouse/ice-catalog/default/ice_t'
If we prohibit setting table location, then we can't output it in SHOW CREATE 
TABLE.

> 3. When execute 'DESCRIBE FORMATTED', table location is 
> 'hdfs://test-warehouse/ice-catalog/default/ice_t', and 
> 'iceberg.catalog_location' is 'hdfs://test-warehouse/ice-catalog'
Yes.

> 4. When execute 'DROP TABLE', we only delete 
> 'hdfs://test-warehouse/ice-catalog/default/ice_t', and reserve 
> 'hdfs://test-warehouse/ice-catalog'
Yes.

> 5. 'iceberg.catalog_location' is necessary for both managed and external 
> 'hadoop.catalog' Iceberg table
Yes. Maybe later we could also have a default location somewhere under the 
warehouse directory. But for now I think explicit catalog locations are better.

> 6. When loading managed table, we use 
> load('iceberg.catalog_location',TableIdentifier(db.tbl))
Yes.

> 7. When loading external table, we use 
> load('iceberg.catalog_location',TableIdentifier('iceberg.table_name'))
Yes, maybe rename 'iceberg.table_name' to 'iceberg.table_identifier'? And the 
value of it should be a dot-separated string of names that was originally used 
to create the Iceberg table. Iceberg can already make table identifiers from 
dot-separated strings:
https://github.com/apache/iceberg/blob/1a797cd986c9193f2e422fec295aaf25ce7e1916/api/src/main/java/org/apache/iceberg/catalog/TableIdentifier.java#L47

>
> Does my understanding correct?
>
> And still some questions:
>
> 1. When creating external 'hadoop.catalog' Iceberg table, are we load table 
> by 'iceberg.catalog_location' and 'iceberg.table_name'? If so, both these two 
> properties are necessary.
So instead of 'iceberg.table_name' we could have 'iceberg.table_identifier', 
just to use Iceberg terminology and semantics. Yeah, probably we want it to be 
a required property for EXTERNAL hadoop catalog tables. I just wonder if it 
would make sense to infer it from 'db_name.tbl_name'. But let's start with it 
as a required property, then later we'll se if inferring it makes sense and add 
support for it in a future patch.

> 2. Can we set the table LOCATION for 'hadoop.catalog' external table? If not, 
> how do we define table LOCATION? Using 
> 'iceberg.catalog_location'+/db/external_tbl? Or use 
> 'iceberg.catalog_location'+'iceberg.table_name'?
I think we should not set LOCATION. Iceberg uses catalog location + table 
identifier to find and load tables, so I think we should provide those:
'iceberg.catalog_location'+'iceberg.table_identifier'
And if Iceberg can load the table with the given information, we can set the 
actual table location. If the table is not found, we should raise an error.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 13
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 25 Sep 2020 16:30:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-25 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 13:

> Hi WangSheng, thank you for your reply.
 >
 > I think we should choose the option that causes the least
 > confusions.
 > With the current solution the users can create two tables in the
 > following way:
 >
 > CREATE TABLE ice_1 (i int) STORED AS ICEBERG LOCATION
 > 'hdfs://test-warehouse/ice-catalog';
 > CREATE TABLE ice_2 (s string) STORED AS ICEBERG LOCATION
 > 'hdfs://test-warehouse/ice-catalog';
 >
 > ice_1 and ice_2 will be created in the same hadoop catalog, but
 > they can contain their own data under their own (implicit) table
 > location.
 > I think it's a valid use case as iceberg supports creating multiple
 > tables in the same catalog. Moreover, it's probably how Iceberg
 > catalogs are meant to be used.
 > Now if the user DROPs one of them, HMS will remove the whole
 > catalog, possibly causing an unintended data loss.
 >
 > Note that this case is different than having two managed PARQUET
 > tables based on same location, because in that case the tables
 > point to the same data. Also I cannot think of a use case when
 > users should do that.
 >
 > I propose the following: we should introduce a new table property:
 > 'iceberg.catalog_location'.
 >
 > So users would create tables with the following statement if they
 > want to use hadoop catalog:
 >
 > CREATE TABLE ice_t (i int)
 > STORED AS ICEBERG
 > TBLPROPERTIES('iceberg.catalog'='hadoop_catalog',
 > 'iceberg.catalog_location'='hdfs://test-warehouse/ice-catalog');
 >
 > In that case it would be quite explicit what's happening. And we'd
 > set the table's location to what Iceberg computes for the table
 > (from  + ).
 > We probably even want to prohibit explicitly setting the table
 > LOCATION (so SHOW CREATE TABLE shouldn't include it either) when
 > using hadoop catalog.
 >
 > So DROP TABLE wouldn't affect other tables, and DESCRIBE FORMATTED
 > would automatically show the actual table LOCATION and
 > 'iceberg.catalog_location' (since it's a table property).
 > If we DROP all the tables in an iceberg catalog, then the empty
 > catalog directory will still remain, but I don't see that as a
 > serious issue.
 >
 > Tables created via HadoopTables are not affected, i.e. they
 > continue to work like they already work.
 >
 > What do you think about this approach?

Hi Zoltan, 'iceberg.catalog_location' is indeed a good approach, and here is 
some of my understanding for this property. For this table:
CREATE TABLE default.ice_t (i int)
  STORED AS ICEBERG
  TBLPROPERTIES('iceberg.catalog'='hadoop_catalog',
'iceberg.catalog_location'='hdfs://test-warehouse/ice-catalog');
1. We prohibit explicitly setting the table LOCATION when using 'hadoop.catalog'
2. When execute 'SHOW CREATE TABLE', table location is 
'hdfs://test-warehouse/ice-catalog/default/ice_t'
3. When execute 'DESCRIBE FORMATTED', table location is 
'hdfs://test-warehouse/ice-catalog/default/ice_t', and 
'iceberg.catalog_location' is 'hdfs://test-warehouse/ice-catalog'
4. When execute 'DROP TABLE', we only delete 
'hdfs://test-warehouse/ice-catalog/default/ice_t', and reserve 
'hdfs://test-warehouse/ice-catalog'
5. 'iceberg.catalog_location' is necessary for both managed and external 
'hadoop.catalog' Iceberg table
6. When loading managed table, we use 
load('iceberg.catalog_location',TableIdentifier(db.tbl))
7. When loading external table, we use 
load('iceberg.catalog_location',TableIdentifier('iceberg.table_name'))

Does my understanding correct?

And still some questions:

1. When creating external 'hadoop.catalog' Iceberg table, are we load table by 
'iceberg.catalog_location' and 'iceberg.table_name'? If so, both these two 
properties are necessary.
2. Can we set the table LOCATION for 'hadoop.catalog' external table? If not, 
how do we define table LOCATION? Using 
'iceberg.catalog_location'+/db/external_tbl? Or use 
'iceberg.catalog_location'+'iceberg.table_name'?


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 13
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 25 Sep 2020 14:25:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-25 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 13:

Hi WangSheng, thank you for your reply.

I think we should choose the option that causes the least confusions.
With the current solution the users can create two tables in the following way:

  CREATE TABLE ice_1 (i int) STORED AS ICEBERG LOCATION 
'hdfs://test-warehouse/ice-catalog';
  CREATE TABLE ice_2 (s string) STORED AS ICEBERG LOCATION 
'hdfs://test-warehouse/ice-catalog';

ice_1 and ice_2 will be created in the same hadoop catalog, but they can 
contain their own data under their own (implicit) table location.
I think it's a valid use case as iceberg supports creating multiple tables in 
the same catalog. Moreover, it's probably how Iceberg catalogs are meant to be 
used.
Now if the user DROPs one of them, HMS will remove the whole catalog, possibly 
causing an unintended data loss.

Note that this case is different than having two managed PARQUET tables based 
on same location, because in that case the tables point to the same data. Also 
I cannot think of a use case when users should do that.

I propose the following: we should introduce a new table property: 
'iceberg.catalog_location'.

So users would create tables with the following statement if they want to use 
hadoop catalog:

  CREATE TABLE ice_t (i int)
  STORED AS ICEBERG
  TBLPROPERTIES('iceberg.catalog'='hadoop_catalog',
'iceberg.catalog_location'='hdfs://test-warehouse/ice-catalog');

In that case it would be quite explicit what's happening. And we'd set the 
table's location to what Iceberg computes for the table (from  + ).
We probably even want to prohibit explicitly setting the table LOCATION (so 
SHOW CREATE TABLE shouldn't include it either) when using hadoop catalog.

So DROP TABLE wouldn't affect other tables, and DESCRIBE FORMATTED would 
automatically show the actual table LOCATION and 'iceberg.catalog_location' 
(since it's a table property).
If we DROP all the tables in an iceberg catalog, then the empty catalog 
directory will still remain, but I don't see that as a serious issue.

Tables created via HadoopTables are not affected, i.e. they continue to work 
like they already work.

What do you think about this approach?


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 13
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 25 Sep 2020 10:04:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-24 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 13:

> (2 comments)

Hi Zoltan, I've already read your reply closely, it seems we have some 
different understanding, here is some of my opinions for this patch:
1. We use location in SQL as table root path, like 
'/test-warehouse/iceberg_test/hadoop_catalog/hadoop_catalog_test', regardless 
of the structure under this location. If we use 'hadoop.catalog', the structure 
like this:
/test-warehouse/iceberg_test/hadoop_catalog/my_db/my_table/metadata/xxx
/test-warehouse/iceberg_test/hadoop_catalog/my_db/my_table/data/xxx
And if we use 'hadoop.tables', the structure like this:
/test-warehouse/iceberg_test/hadoop_catalog/metadata/xxx
/test-warehouse/iceberg_test/hadoop_catalog/data/xxx
In this situation, whether creating managed or external Iceberg table based on 
'hadoop.catalog'/'hadoop.tables', we just need to provide a location 
'/test-warehouse/iceberg_test/hadoop_catalog'. Even if you don't provide a 
location in SQL when creating managed Iceberg table, we will also use 
'$DEFAULT_WAREHOUSE/my_table' as table root path. I think this keep the consist 
of 'hadoop.catalog' and 'hadoop.tables'. So we just need to remember a root 
table path.

2. Based on above situation, when creating two managed table on same location 
based on 'hadoop.catalog', drop one of the table, the location will be deleted 
by HMS. And I think this keep the consist of HdfsTable and IcebergTable. For 
example, when creating two managed PARQUET tables based on same location, drop 
one of the table, the whole location will also be deleted by HMS.

Based on above opinions, here is some of my questions:

1. If using HadoopCatalog.dropTable in code, the root path 
'test-warehouse/iceberg_test/hadoop_catalog' will be reserved, just deleted 
'/my_db/my_table', this is different from HadoopTables or normal HdfsTable, 
which DROP TABLE will delete whole location, does this make users feel confused?

2. DESCRIBE FORMATTED shows the actual table location 
'/test-warehouse/iceberg_test/hadoop_catalog/my_db/my_table', but SHOW CREATE 
TABLE shows sql location '/test-warehouse/iceberg_test/hadoop_catalog', I‘m not 
sure if this is appropriate for two queries return different location on same 
table. But same location for 'hadoop.tables'.

If you think the above two modifications are indeed better, I will adjust code 
in current patch as soon as possible.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 13
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 25 Sep 2020 02:39:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-24 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 13:

> (2 comments)

Hi Zoltan, I've already read your reply closely, it seems we have some 
different understanding, here is some of my opinions for this patch:
1. We use location in SQL as table root path, like 
'/test-warehouse/iceberg_test/hadoop_catalog/hadoop_catalog_test', regardless 
of the structure under this location. If we use 'hadoop.catalog', the structure 
like this:
/test-warehouse/iceberg_test/hadoop_catalog/my_db/my_table/metadata/xxx
/test-warehouse/iceberg_test/hadoop_catalog/my_db/my_table/data/xxx
And if we use 'hadoop.tables', the structure like this:
/test-warehouse/iceberg_test/hadoop_catalog/metadata/xxx
/test-warehouse/iceberg_test/hadoop_catalog/data/xxx
In this situation, whether creating managed or external Iceberg table based on 
'hadoop.catalog'/'hadoop.tables', we just need to provide a location 
'/test-warehouse/iceberg_test/hadoop_catalog'. Even if you don't provide a 
location in SQL when creating managed Iceberg table, we will also use 
'$DEFAULT_WAREHOUSE/my_table' as table root path. I think this keep the consist 
of 'hadoop.catalog' and 'hadoop.tables'. So we just need to remember a root 
table path.

2. Based on above situation, when creating two managed table on same location 
based on 'hadoop.catalog', drop one of the table, the location will be deleted 
by HMS. And I think this keep the consist of HdfsTable and IcebergTable. For 
example, when creating two managed PARQUET tables based on same location, drop 
one of the table, the whole location will also be deleted by HMS.

Based on above opinions, here is some of my questions:

1. If using HadoopCatalog.dropTable in code, the root path 
'test-warehouse/iceberg_test/hadoop_catalog' will be reserved, just deleted 
'/my_db/my_table', this is different from HadoopTables or normal HdfsTable, 
which DROP TABLE will delete whole location, does this make users feel confused?

2. DESCRIBE FORMATTED shows the actual table location 
'/test-warehouse/iceberg_test/hadoop_catalog/my_db/my_table', but SHOW CREATE 
TABLE shows sql location '/test-warehouse/iceberg_test/hadoop_catalog', I‘m not 
sure if this is appropriate for two queries return different location on same 
table. But same location for 'hadoop.tables'.

If you think the above two modifications are indeed better, I will adjust code 
in current patch as soon as possible.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 13
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 25 Sep 2020 02:38:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-24 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 13:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16446/13/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test:

http://gerrit.cloudera.org:8080/#/c/16446/13/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@80
PS13, Line 80: row_regex:.*CAUSED BY: NoSuchTableException: Table does not 
exist: $DATABASE.iceberg_test7*
Since the test only drops 'iceberg_test6' I think the expected behavior is that 
'iceberg_test7' still exists.

I'm OK with doing this in a separate patch and Jira, but in that case we should 
prohibit dropping Iceberg tables in this patch because this behavior is 
unexpected and can cause potential data loss.

We should probably use the dropTable 
(https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java#L183)
 method to drop the table from the Iceberg catalog, then drop the table from 
HMS, e.g. like here: 
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L1918-L1920

But for IcebergTables we need to set the param 'purge' to false (Because the 
data files are already been purged by Iceberg. Also, the location parameter 
might points to the catalog location instead of the table location, and we 
don't want to remove the whole catalog).

So I think in the end you might want to add a dropTable() method to 
IcebergCatalogOpExecutor that drops Iceberg tables properly, both from Iceberg 
and HMS.


http://gerrit.cloudera.org:8080/#/c/16446/13/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test:

http://gerrit.cloudera.org:8080/#/c/16446/13/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@305
PS13, Line 305: 'Location:   
','$NAMENODE/test-warehouse/iceberg_test/hadoop_catalog/hadoop_catalog_test','NULL'
Can we show the actual table location 
(https://github.com/apache/iceberg/blob/ada047be14205a88396e0ddf8b0650a9f44c/api/src/main/java/org/apache/iceberg/Table.java#L94),
 not the HadoopCatalog location here?

So I think in the end we want the following behavior:

* DROP TABLE only drops the referred table. If the table is truly external (so 
it doesn't have 'external.table.purge' property), we only drop the table from 
HMS, keeping the data files.

* SHOW CREATE TABLE shows the CREATE TABLE statement that created the table, so 
for location it shows the hadoop catalog location

* DESCRIBE FORMATTED shows the actual table location



--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 13
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 24 Sep 2020 17:24:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-24 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 13:

> Yeah, I just checked and there is no DESCRIBE EXTENDED test,
 > however there are DESCRIBE FORMATTED tests which is basically the
 > same thing.
 >
 > You can use
 >  RESULTS: VERIFY_IS_SUBSET

Already add two kinds of test case.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 13
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 24 Sep 2020 12:41:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7270/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 13
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 24 Sep 2020 12:25:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-24 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..

IMPALA-10164: Supporting HadoopCatalog for Iceberg table

This patch mainly realizes creating Iceberg table by HadoopCatalog.
We only supported HadoopTables api before this patch, but now we can
use HadoopCatalog to create Iceberg table. When creating managed table,
we can use SQL like this:
  CREATE TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
  )
  STORED AS ICEBERG
  LOCATION 'hdfs://test-warehouse/hadoop_catalog_test'
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog');
We supported two values ('hadoop.catalog', 'hadoop.tables') for
'iceberg.catalog' now. If you don't specify this property in your SQL,
default catalog type is 'hadoop.catalog'.
As for external Iceberg table, you can use SQL like this:
  CREATE EXTERNAL TABLE default.iceberg_test_external
  STORED AS ICEBERG
  LOCATION 'hdfs://test-warehouse/hadoop_catalog_test'
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.table_name'='default.iceberg_test');
'iceberg.table_name' is the managed Iceberg table name, just like
'kudu.table_name' when creating external Kudu table. If this property
not been specified in SQL, Impala will use database and table name to
load Iceberg table, which is 'default.iceberg_test_external' in above SQL.
This property cannot be set with managed table.

Testing:
- Create table tests in functional_schema_template.sql
- Iceberg table create test in test_iceberg.py
- Iceberg table query test in test_scanners.py

Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-f698d7a4-245f-44d5-8a59-ed511854c8f8-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-7c1d5490-91f7-47bd-a3b6-e86caa7fe47d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-d2ef5fcf-4346-421f-b2ef-1f9d55fb4c84-0.parquet
A 

[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-24 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 12:

Yeah, I just checked and there is no DESCRIBE EXTENDED test, however there are 
DESCRIBE FORMATTED tests which is basically the same thing.

You can use
 RESULTS: VERIFY_IS_SUBSET


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 12
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 24 Sep 2020 09:06:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-24 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 12:

> > Thanks for fixing the test!
 > > Currently I'm wondering about DROP TABLE. Could you please add a
 > > test that creates two Iceberg tables in the same HadoopCatalog,
 > > DROP one of them and check that the other one still exists.
 > > Could you please also add a DESCRIBE EXTENDED test to check the
 > > TBLPROPERTIES and LOCATION?
 >
 > Hi Zoltan, I don't understand about "creates two Iceberg tables in
 > the same HadoopCatalog". Do you mean create two managed table with
 > HadoopCatalog in different location? Or create two external table
 > with HadoopCatalog in same location? Or create a managed table, and
 > create a external location to this managed table?

Thanks for your patient explain, Zoltan, and I understand. Of course your 
understanding is totally correct! I will add this test case lately.
Another question is for 'DESCRIBE EXTENDED table', the result will returns lots 
of information, how do I define ' RESULTS' in test file? Use 'row_regex' to 
filter some info or list all values like this:
'# col_name','data_type','comment'
'','NULL','NULL'
'id','int','NULL'
'user','string','NULL'
...
I didn't found any case of 'DESCRIBE EXTENDED table' in 
'testdata/workloads/functional-query/queries/QueryTest', only found 'describe 
database extended xxx', which is very different from 'DESCRIBE EXTENDED xxx':
 RESULTS
'','$USER','USER'
'Owner: ','',''
'impala_test_desc_db1','$NAMENODE/$EXTERNAL_WAREHOUSE_DIR/impala_test_desc_db1.db',''


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 12
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 24 Sep 2020 09:00:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-24 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 12:

I was thinking about two managed Iceberg tables using the same hadoop catalog 
location.

If I understand correctly, in CREATE TABLE we can provide the location of the 
HadoopCatalog, i.e. a directory in HDFS.
Then the actual table location is derived from the HadoopCatalog location + 
DB_NAME + TABLE_NAME.
So in the end the Iceberg table will be located at some directory like:

hdfs://test-warehouse/iceberg-hadoop-catalog/my_database/my_table

And if we create a second managed table with the same HadoopCatalog location, 
it will be placed at:

hdfs://test-warehouse/iceberg-hadoop-catalog/my_database_2/my_table_2

But in the Hive Metastore, for table location we only store 
hdfs://test-warehouse/iceberg-hadoop-catalog/, right? So if my understanding is 
correct, on DROP TABLE we might delete the whole 
hdfs://test-warehouse/iceberg-hadoop-catalog/ directory. Or maybe I 
misunderstood something? But adding such test is useful anyway in my opinion.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 12
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 24 Sep 2020 08:37:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-24 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 12:

> Thanks for fixing the test!
 > Currently I'm wondering about DROP TABLE. Could you please add a
 > test that creates two Iceberg tables in the same HadoopCatalog,
 > DROP one of them and check that the other one still exists.
 > Could you please also add a DESCRIBE EXTENDED test to check the
 > TBLPROPERTIES and LOCATION?

Hi Zoltan, I don't understand about "creates two Iceberg tables in the same 
HadoopCatalog". Do you mean create two managed table with HadoopCatalog in 
different location? Or create two external table with HadoopCatalog in same 
location? Or create a managed table, and create a external location to this 
managed table?


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 12
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 24 Sep 2020 08:17:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-24 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 12:

Thanks for fixing the test!
Currently I'm wondering about DROP TABLE. Could you please add a test that 
creates two Iceberg tables in the same HadoopCatalog, DROP one of them and 
check that the other one still exists.
Could you please also add a DESCRIBE EXTENDED test to check the TBLPROPERTIES 
and LOCATION?


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 12
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 24 Sep 2020 08:07:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-23 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 12:

Hi Zoltan, already modify test case which failed during local catalog mode, but 
Jenkins failed based on patch 12: 
https://jenkins.impala.io/job/pre-review-test/713/
Seems also related to IMPALA-9923. The failed log is 
load-tpcds-core-hive-generated-orc-def-block.sql.log, error log is 'Error: 
Error while compiling statement: FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException: Fail to get 
checksum, since file 
/test-warehouse/managed/tpcds.store_sales_orc_def/ss_sold_date_sk=2452612/base_005/_orc_acid_version
 is under construction. (state=08S01,code=1)
java.sql.SQLException: Error while compiling statement: FAILED: Execution 
Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. 
java.io.IOException: Fail to get checksum, since file 
/test-warehouse/managed/tpcds.store_sales_orc_def/ss_sold_date_sk=2452612/base_005/_orc_acid_version
 is under construction.'


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 12
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 24 Sep 2020 02:26:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 10: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6467/


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 10
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 23 Sep 2020 12:49:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7248/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 11
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 23 Sep 2020 08:26:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-23 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..

IMPALA-10164: Supporting HadoopCatalog for Iceberg table

This patch mainly realizes creating Iceberg table by HadoopCatalog.
We only supported HadoopTables api before this patch, but now we can
use HadoopCatalog to create Iceberg table. When creating managed table,
we can use SQL like this:
  CREATE TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
  )
  STORED AS ICEBERG
  LOCATION 'hdfs://test-warehouse/hadoop_catalog_test'
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog');
We supported two values ('hadoop.catalog', 'hadoop.tables') for
'iceberg.catalog' now. If you don't specify this property in your SQL,
default catalog type is 'hadoop.catalog'.
As for external Iceberg table, you can use SQL like this:
  CREATE EXTERNAL TABLE default.iceberg_test_external
  STORED AS ICEBERG
  LOCATION 'hdfs://test-warehouse/hadoop_catalog_test'
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.table_name'='default.iceberg_test');
'iceberg.table_name' is the managed Iceberg table name, just like
'kudu.table_name' when creating external Kudu table. If this property
not been specified in SQL, Impala will use database and table name to
load Iceberg table, which is 'default.iceberg_test_external' in above SQL.
This property cannot be set with managed table.

Testing:
- Create table tests in functional_schema_template.sql
- Iceberg table create test in test_iceberg.py
- Iceberg table query test in test_scanners.py

Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-f698d7a4-245f-44d5-8a59-ed511854c8f8-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-7c1d5490-91f7-47bd-a3b6-e86caa7fe47d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-d2ef5fcf-4346-421f-b2ef-1f9d55fb4c84-0.parquet
A 

[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-23 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 10:

> The data loading failure was probably due to IMPALA-9923 so I
 > restarted the verify job.
 > However, later I've found that there were some failing Iceberg
 > tests in the dockerised environment, so it's probably reproducible
 > in local catalog mode:
 >
 > bin/start-impala-cluster.py --impalad_args --enable_minidumps=false
 > --impalad_args --use_local_catalog=true --catalogd_args
 > --catalog_topic_mode=minimal
 >
 > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3199/testReport/

ok, I will test local catalog mode in my own environment with these new test 
cases.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 10
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 23 Sep 2020 07:37:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-23 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 9:

The data loading failure was probably due to IMPALA-9923 so I restarted the 
verify job.
However, later I've found that there were some failing Iceberg tests in the 
dockerised environment, so it's probably reproducible in local catalog mode:

bin/start-impala-cluster.py --impalad_args --enable_minidumps=false 
--impalad_args --use_local_catalog=true --catalogd_args 
--catalog_topic_mode=minimal

https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3199/testReport/


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 9
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 23 Sep 2020 07:34:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-23 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 10:

I see some Iceberg test related failures as well. Can these be related to this 
change?
https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3199/


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 10
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 23 Sep 2020 07:29:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 10: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 10
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 23 Sep 2020 07:27:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6467/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 10
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 23 Sep 2020 07:27:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 9: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6461/


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 9
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 22 Sep 2020 18:52:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6461/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 9
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 22 Sep 2020 14:47:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 9: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 9
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 22 Sep 2020 14:47:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-22 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 8: Code-Review+2

(2 comments)

Thx for your changes! I'm fine with this patch. Don't forget to open a Jira for 
rewriting iceber_file_format, please. (Added a comment about this)

http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@69
PS3, Line 69: iceberg_file_format
> I don't have a strong opinion on that, whichever works best for you.
I see this was postponed to a later commit. I'm fine with that but could you 
open a Jira that we can keep track of? (please add impala-icegerg label)


http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test:

http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@191
PS2, Line 191: SELECT count(*) from hadoop_catalog_test_external
> This test files are generated by spark-shell, if you want to test non-exter
I see. Let's wait till the write support then.



--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 8
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 22 Sep 2020 14:47:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-22 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 8: Code-Review+1

Thanks for applying the changes, LTGM!


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 8
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 22 Sep 2020 13:26:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7235/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 8
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 22 Sep 2020 12:11:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-22 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..

IMPALA-10164: Supporting HadoopCatalog for Iceberg table

This patch mainly realizes creating Iceberg table by HadoopCatalog.
We only supported HadoopTables api before this patch, but now we can
use HadoopCatalog to create Iceberg table. When creating managed table,
we can use SQL like this:
  CREATE TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
  )
  STORED AS ICEBERG
  LOCATION 'hdfs://test-warehouse/hadoop_catalog_test'
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog');
We supported two values ('hadoop.catalog', 'hadoop.tables') for
'iceberg.catalog' now. If you don't specify this property in your SQL,
default catalog type is 'hadoop.catalog'.
As for external Iceberg table, you can use SQL like this:
  CREATE EXTERNAL TABLE default.iceberg_test_external
  STORED AS ICEBERG
  LOCATION 'hdfs://test-warehouse/hadoop_catalog_test'
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.table_name'='default.iceberg_test');
'iceberg.table_name' is the managed Iceberg table name, just like
'kudu.table_name' when creating external Kudu table. If this property
not been specified in SQL, Impala will use database and table name to
load Iceberg table, which is 'default.iceberg_test_external' in above SQL.
This property cannot be set with managed table.

Testing:
- Create table tests in functional_schema_template.sql
- Iceberg table create test in test_iceberg.py
- Iceberg table query test in test_scanners.py

Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-f698d7a4-245f-44d5-8a59-ed511854c8f8-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-7c1d5490-91f7-47bd-a3b6-e86caa7fe47d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-d2ef5fcf-4346-421f-b2ef-1f9d55fb4c84-0.parquet
A 

[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7231/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 7
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 22 Sep 2020 02:04:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-21 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..

IMPALA-10164: Supporting HadoopCatalog for Iceberg table

This patch mainly realizes creating Iceberg table by HadoopCatalog.
We only supported HadoopTables api before this patch, but now we can
use HadoopCatalog to create Iceberg table. When creating managed table,
we can use SQL like this:
  CREATE TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
  )
  STORED AS ICEBERG
  LOCATION 'hdfs://test-warehouse/hadoop_catalog_test'
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog');
We supported two values ('hadoop.catalog', 'hadoop.tables') for
'iceberg.catalog' now. If you don't specify this property in your SQL,
default catalog type is 'hadoop.catalog'.
As for external Iceberg table, you can use SQL like this:
  CREATE EXTERNAL TABLE default.iceberg_test_external
  STORED AS ICEBERG
  LOCATION 'hdfs://test-warehouse/hadoop_catalog_test'
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.table_name'='default.iceberg_test');
'iceberg.table_name' is the managed Iceberg table name, just like
'kudu.table_name' when creating external Kudu table. If this property
not been specified in SQL, Impala will use database and table name to
load Iceberg table, which is 'default.iceberg_test_external' in above SQL.
This property cannot be set with managed table.

Testing:
- Create table tests in functional_schema_template.sql
- Iceberg table create test in test_iceberg.py
- Iceberg table query test in test_scanners.py

Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-f698d7a4-245f-44d5-8a59-ed511854c8f8-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-7c1d5490-91f7-47bd-a3b6-e86caa7fe47d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-d2ef5fcf-4346-421f-b2ef-1f9d55fb4c84-0.parquet
A 

[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7219/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 6
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Mon, 21 Sep 2020 14:58:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-21 Thread wangsheng (Code Review)
Hello Gabor Kaszab, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16446

to look at the new patch set (#6).

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..

IMPALA-10164: Supporting HadoopCatalog for Iceberg table

This patch mainly realizes creating Iceberg table by HadoopCatalog.
We only supported HadoopTables api before this patch, but now we can
use HadoopCatalog to create Iceberg table. When creating managed table,
we can use SQL like this:
  CREATE TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
  )
  STORED AS ICEBERG
  LOCATION 'hdfs://test-warehouse/hadoop_catalog_test'
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog');
We supported two values ('hadoop.catalog', 'hadoop.tables') for
'iceberg.catalog' now. If you don't specify this property in your SQL,
default catalog type is 'hadoop.catalog'.
As for external Iceberg table, you can use SQL like this:
  CREATE EXTERNAL TABLE default.iceberg_test_external
  STORED AS ICEBERG
  LOCATION 'hdfs://test-warehouse/hadoop_catalog_test'
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.table_name'='default.iceberg_test');
'iceberg.table_name' is the managed Iceberg table name, just like
'kudu.table_name' when creating external Kudu table. If this property
not been specified in SQL, Impala will use database and table name to
load Iceberg table, which is 'default.iceberg_test_external' in above SQL.
This property cannot be set with managed table.

Testing:
- Create table tests in functional_schema_template.sql
- Iceberg table create test in test_iceberg.py
- Iceberg table query test in test_scanners.py

Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-f698d7a4-245f-44d5-8a59-ed511854c8f8-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-7c1d5490-91f7-47bd-a3b6-e86caa7fe47d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-d2ef5fcf-4346-421f-b2ef-1f9d55fb4c84-0.parquet
A 

[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7218/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 5
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Mon, 21 Sep 2020 14:17:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-21 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..

IMPALA-10164: Supporting HadoopCatalog for Iceberg table

This patch mainly realizes creating Iceberg table by HadoopCatalog.
We only supported HadoopTables api before this patch, but now we can
use HadoopCatalog to create Iceberg table. When creating managed table,
we can use SQL like this:
  CREATE TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
  )
  STORED AS ICEBERG
  LOCATION 'hdfs://test-warehouse/hadoop_catalog_test'
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog');
We supported two values ('hadoop.catalog', 'hadoop.tables') for
'iceberg.catalog' now. If you don't specify this property in your SQL,
default catalog type is 'hadoop.catalog'.
As for external Iceberg table, you can use SQL like this:
  CREATE EXTERNAL TABLE default.iceberg_test_external
  STORED AS ICEBERG
  LOCATION 'hdfs://test-warehouse/hadoop_catalog_test'
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.table_name'='default.iceberg_test');
'iceberg.table_name' is the managed Iceberg table name, just like
'kudu.table_name' when creating external Kudu table. If this property
not been specified in SQL, Impala will use database and table name to
load Iceberg table, which is 'default.iceberg_test_external' in above SQL.
This property cannot be set with managed table.

Testing:
- Create table tests in functional_schema_template.sql
- Iceberg table create test in test_iceberg.py
- Iceberg table query test in test_scanners.py

Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-f698d7a4-245f-44d5-8a59-ed511854c8f8-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-7c1d5490-91f7-47bd-a3b6-e86caa7fe47d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-d2ef5fcf-4346-421f-b2ef-1f9d55fb4c84-0.parquet
A 

[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 4: Code-Review+1

(2 comments)

Thanks for applying the changes!

http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@69
PS3, Line 69: iceberg_file_format
> Shall we change this property name in this patch? Maybe in another patch is
I don't have a strong opinion on that, whichever works best for you.


http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@78
PS3, Line 78: // Each table id increase from zero
: HadoopCatalog catalo
> Done
Maybe these functions should be private as well. I guess we only want them to 
be called from the public createTable() at L54.

Probably we should also rename them to createTableByHadoopTables() and 
createTableByHadoopCatalog().



--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 4
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Mon, 21 Sep 2020 13:26:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7217/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 4
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Mon, 21 Sep 2020 13:18:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-21 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 4:

(7 comments)

I've already rename two new properties to: 'iceberg.catalog' and 
'iceberg.table_name', I'm not sure if this is appropriate to rename 
'iceberg_file_format' to 'iceberg.file_format' in this patch.

http://gerrit.cloudera.org:8080/#/c/16446/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16446/3//COMMIT_MSG@29
PS3, Line 29: .
> nit: Shouldn't we use 'iceberg.table_name' just like 'kudu.table_name'. In
Done


http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@69
PS3, Line 69: iceberg_file_format
> If that's not too much pain, maybe we should also change it to "iceberg.fil
Shall we change this property name in this patch? Maybe in another patch is 
better?


http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@72
PS3, Line 72: iceberg.catalog
> In the commit message you mention "iceberg.catalog", not "iceberg_catalog".
Done


http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@78
PS3, Line 78: // Each table id increase from zero
: HadoopCatalog catalo
> You already set it at L57.
Done


http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@102
PS3, Line 102:  {
> nit: probably 'Catalog.DEFAULT_DB' should be used
Done


http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@149
PS3, Line 149: .contains(
> nit: Catalog.DEFAULT_DB
Done


http://gerrit.cloudera.org:8080/#/c/16446/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test:

http://gerrit.cloudera.org:8080/#/c/16446/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@221
PS3, Line 221: SELECT count(*) from hadoop_catalog_test_external
> Please add "SET TIMEZONE=UTC;" to tests that involve timestamps. For furthe
Done



--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 4
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Mon, 21 Sep 2020 12:56:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-21 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..

IMPALA-10164: Supporting HadoopCatalog for Iceberg table

This patch mainly realizes creating Iceberg table by HadoopCatalog.
We only supported HadoopTables api before this patch, but now we can
use HadoopCatalog to create Iceberg table. When creating managed table,
we can use SQL like this:
  CREATE TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
  )
  STORED AS ICEBERG
  LOCATION 'hdfs://test-warehouse/hadoop_catalog_test'
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog');
We supported two values ('hadoop.catalog', 'hadoop.tables') for
'iceberg.catalog' now. If you don't specify this property in your SQL,
default catalog type is 'hadoop.catalog'.
As for external Iceberg table, you can use SQL like this:
  CREATE EXTERNAL TABLE default.iceberg_test_external
  STORED AS ICEBERG
  LOCATION 'hdfs://test-warehouse/hadoop_catalog_test'
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.table_name'='default.iceberg_test');
'iceberg.table_name' is the managed Iceberg table name, just like
'kudu.table_name' when creating external Kudu table. If this property
not been specified in SQL, Impala will use database and table name to
load Iceberg table, which is 'default.iceberg_test_external' in above SQL.
This property cannot be set with managed table.

Testing:
- Create table tests in functional_schema_template.sql
- Iceberg table create test in test_iceberg.py
- Iceberg table query test in test_scanners.py

Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-f698d7a4-245f-44d5-8a59-ed511854c8f8-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-7c1d5490-91f7-47bd-a3b6-e86caa7fe47d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-d2ef5fcf-4346-421f-b2ef-1f9d55fb4c84-0.parquet
A 

[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-21 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 3:

(6 comments)

Done

http://gerrit.cloudera.org:8080/#/c/16446/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16446/2//COMMIT_MSG@21
PS2, Line 21: We supported two values ('hadoop.catalog', 'hadoop.tables') for
: 'iceberg.catalog'
> I'd list the currently possible values ('hadoop.catalog', 'hadoop.table').
Done


http://gerrit.cloudera.org:8080/#/c/16446/2//COMMIT_MSG@22
PS2, Line 22: now. If you don't specify this property i
> For me talking about external tables would be more readable in a separate p
Done


http://gerrit.cloudera.org:8080/#/c/16446/2//COMMIT_MSG@28
PS2, Line 28:
> nit: please avoid using tabs for indentation
Done


http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test:

http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@56
PS2, Line 56: TBLPROPERTIES('iceberg_table_name'='fake_db.fake_table');
> Is it possible to create an external table using some invalid 'iceberg_tabl
Done


http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test:

http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@191
PS2, Line 191: SELECT count(*) from hadoop_catalog_test_external
> I know that we can't write Iceberg tables at the moment but is there any wa
This test files are generated by spark-shell, if you want to test non-external 
tables with 'hadoop_catalog', we need to create Iceberg table by Impala first, 
and then generate data by spark-shell. This maybe difficult to implement by 
impala test.
Now we can only query empty Iceberg table created by Impala.


http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
File 
testdata/workloads/functional-query/queries/QueryTest/show-create-table.test:

http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test@585
PS2, Line 585: TBLPROPERTIES('iceberg_file_format'='parquet', 
'iceberg_catalog'='hadoop.catalog')
> Could you also add test coverage here for the default value of 'iceberg_cat
Done



--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 3
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Mon, 21 Sep 2020 12:32:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 3:

(7 comments)

Thanks for your work again! This also makes it easier for us to add HiveCatalog 
later on. I only found some minor issues.

http://gerrit.cloudera.org:8080/#/c/16446/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16446/3//COMMIT_MSG@29
PS3, Line 29: _
nit: Shouldn't we use 'iceberg.table_name' just like 'kudu.table_name'. In the 
above example we use a dot for specifying the catalog, i.e. 'iceberg.catalog'. 
This is also what Hiveberg does, so I prefer the dot notation to be consistent.


http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@69
PS3, Line 69: iceberg_file_format
If that's not too much pain, maybe we should also change it to 
"iceberg.file_format" for consistency.


http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@72
PS3, Line 72: iceberg_catalog
In the commit message you mention "iceberg.catalog", not "iceberg_catalog".

Hiveberg also uses the dot syntax: https://github.com/ExpediaGroup/hiveberg

For Kudu options we also use the dot syntax, so maybe we should also use the 
dot syntax for Iceberg properties.


http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@78
PS3, Line 78: // Each table id increase from zero
: iThreadLocal.set(0);
You already set it at L57.


http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@102
PS3, Line 102: "default"
nit: probably 'Catalog.DEFAULT_DB' should be used


http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@149
PS3, Line 149: "default."
nit: Catalog.DEFAULT_DB


http://gerrit.cloudera.org:8080/#/c/16446/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test:

http://gerrit.cloudera.org:8080/#/c/16446/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@221
PS3, Line 221: where event_time > to_timestamp('2020-01-01 
09:00:00','-MM-dd HH:mm:ss')
Please add "SET TIMEZONE=UTC;" to tests that involve timestamps. For further 
info please see IMPALA-10158.



--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 3
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Mon, 21 Sep 2020 11:58:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7213/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 3
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Mon, 21 Sep 2020 10:05:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-21 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..

IMPALA-10164: Supporting HadoopCatalog for Iceberg table

This patch mainly realizes creating Iceberg table by HadoopCatalog.
We only supported HadoopTables api before this patch, but now we can
use HadoopCatalog to create Iceberg table. When creating managed table,
we can use SQL like this:
  CREATE TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
  )
  STORED AS ICEBERG
  LOCATION 'hdfs://test-warehouse/hadoop_catalog_test'
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog');
We supported two values ('hadoop.catalog', 'hadoop.tables') for
'iceberg.catalog' now. If you don't specify this property in your SQL,
default catalog type is 'hadoop.catalog'.
As for external Iceberg table, you can use SQL like this:
  CREATE EXTERNAL TABLE default.iceberg_test_external
  STORED AS ICEBERG
  LOCATION 'hdfs://test-warehouse/hadoop_catalog_test'
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg_table_name'='default.iceberg_test');
'iceberg_table_name' is the managed Iceberg table name, just like
'kudu.table_name' when creating external Kudu table. If this property
not been specified in SQL, Impala will use database and table name to
load Iceberg table, which is 'default.iceberg_test_external' in above SQL.
This property cannot be set with managed table.

Testing:
- Create table tests in functional_schema_template.sql
- Iceberg table create test in test_iceberg.py
- Iceberg table query test in test_scanners.py

Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-f698d7a4-245f-44d5-8a59-ed511854c8f8-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-7c1d5490-91f7-47bd-a3b6-e86caa7fe47d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-d2ef5fcf-4346-421f-b2ef-1f9d55fb4c84-0.parquet
A 

[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-17 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 2:

(6 comments)

Thanks for the changes!
I have some minor ones but the patch in general is in a good shape. I'll let 
Zoltan to take a look also.

http://gerrit.cloudera.org:8080/#/c/16446/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16446/2//COMMIT_MSG@21
PS2, Line 21: If you don't specify this property in your SQL, default catalog 
type is
: 'hadoop.catalog'.
I'd list the currently possible values ('hadoop.catalog', 'hadoop.table').


http://gerrit.cloudera.org:8080/#/c/16446/2//COMMIT_MSG@22
PS2, Line 22: And if you want to create external table,
For me talking about external tables would be more readable in a separate 
paragraph.


http://gerrit.cloudera.org:8080/#/c/16446/2//COMMIT_MSG@28
PS2, Line 28:
nit: please avoid using tabs for indentation


http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test:

http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@56
PS2, Line 56: TBLPROPERTIES('iceberg_table_name'='fake_db.fake_table');
Is it possible to create an external table using some invalid 
'iceberg_table_name'? Is there such a scenario and what is expected if there is?


http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test:

http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@191
PS2, Line 191: SELECT count(*) from hadoop_catalog_test_external
I know that we can't write Iceberg tables at the moment but is there any way to 
extend these tests for non-external tables that use 'hadoop_catalog'?


http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
File 
testdata/workloads/functional-query/queries/QueryTest/show-create-table.test:

http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test@585
PS2, Line 585: TBLPROPERTIES('iceberg_file_format'='parquet', 
'iceberg_catalog'='hadoop.catalog')
Could you also add test coverage here for the default value of 
'iceberg_catalog'?



--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 2
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 17 Sep 2020 16:32:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-15 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 2:

(8 comments)

Hi Gabor, thanks for your review, I've already modify code.

http://gerrit.cloudera.org:8080/#/c/16446/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16446/1//COMMIT_MSG@20
PS1, Line 20: og');
: If you don't spe
> Is there any consideration behind making hadoop.tables the default? AFAIK h
Since we only supported HadoopTables before, I use this as default catalog 
type, there is no more reasons.
You are right, hadoop.catalog gives more than hadoop.tables, so if necessary, I 
will use hadoop.catalog as default catalog type and modified related files.


http://gerrit.cloudera.org:8080/#/c/16446/1/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/16446/1/common/thrift/CatalogObjects.thrift@96
PS1, Line 96: identified
> nit: typo
Done


http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@244
PS1, Line 244: at getIcebergFile
> I find it confusing that IcebergUtil also has a function with the same name
Done


http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@71
PS1, Line 71:   // Iceberg catalog type key in tblproperties
> nit: pls add comment similarly to ICEBERG_FILE_FORMAT above.
Done


http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@111
PS1, Line 111:   return tableParams_.icebergCatalog_;
 :   }
 :
 :   @
> FeIcebergTable has 2 implementations, this class and IcebergTable. Both hav
I add icebergCatalog_ in TableParams in LocalIcebergTable.java, and return 
'tableParams_.icebergCatalog_' in getIcebergCatalog(), just like 
getIcebergTableLocation().
I found that this way is also suitable for getIcebergFileFormat(), I'm not sure 
to modify in this patch.


http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java:

http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@a93
PS1, Line 93:
> As I see this patch also contains some modifications related to Iceberg tab
Done


http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@54
PS1, Line 54: TIcebergCatalo
> Using TIcebergCatalog as a param would be easier to understand what is the
Done


http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@63
PS1, Line 63:   Preconditions.checkArgument(catalog == 
TIcebergCatalog.HADOOP_TABLES);
> Would worth a Preconditions check in the else branch that icebergCatalog ==
Done



--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 2
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 15 Sep 2020 13:39:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7180/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 2
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 15 Sep 2020 13:13:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-15 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..

IMPALA-10164: Supporting HadoopCatalog for Iceberg table

This patch mainly realizes creating Iceberg table by HadoopCatalog.
We only supported HadoopTables api before this patch, but now we can
use HadoopCatalog to create Iceberg table. When creating managed table,
we can use SQL like this:
  CREATE TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
  )
  STORED AS ICEBERG
  LOCATION 'hdfs://test-warehouse/hadoop_catalog_test'
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog');
If you don't specify this property in your SQL, default catalog type is
'hadoop.catalog'. And if you want to create external table, you can use
SQL like this:
  CREATE EXTERNAL TABLE default.iceberg_test_external
  STORED AS ICEBERG
  LOCATION 'hdfs://test-warehouse/hadoop_catalog_test'
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg_table_name'='default.iceberg_test');
'iceberg_table_name' is the managed Iceberg table name, just like
'kudu.table_name' when creating external Kudu table. If this property
not been specified in SQL, Impala will use database and table name to
load Iceberg table, which is 'default.iceberg_test_external' in above SQL.
This property cannot be set with managed table.

Testing:
- Create table tests in functional_schema_template.sql
- Iceberg table create test in test_iceberg.py
- Iceberg table query test in test_scanners.py

Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-f698d7a4-245f-44d5-8a59-ed511854c8f8-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-7c1d5490-91f7-47bd-a3b6-e86caa7fe47d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-d2ef5fcf-4346-421f-b2ef-1f9d55fb4c84-0.parquet
A 

[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-14 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16446/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16446/1//COMMIT_MSG@20
PS1, Line 20: default catalog type is
: 'hadoop.tables'.
Is there any consideration behind making hadoop.tables the default? AFAIK 
hadoop.catalog gives more than hadoop.tables so let's consider making it as a 
default.

But making one step forward, in the future we might want to introduce 
hive.catalog as well that works with S3 too so in the long run that should be 
the default in my opinion.



--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 1
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 14 Sep 2020 16:32:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-14 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 1:

(8 comments)

Thanks for taking care of this patch! I have some comments but nothing serious 
as the patch is in good shape in my opinion.

What I miss is some test coverage to cover at least the new syntax.

http://gerrit.cloudera.org:8080/#/c/16446/1/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/16446/1/common/thrift/CatalogObjects.thrift@96
PS1, Line 96: identitied
nit: typo


http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@244
PS1, Line 244: getIcebergCatalog
I find it confusing that IcebergUtil also has a function with the same name but 
different parameterisation. Could you consider moving this one to that class 
next to that function with the same name?


http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@71
PS1, Line 71:   public static final String ICEBERG_CATALOG = "iceberg.catalog";
nit: pls add comment similarly to ICEBERG_FILE_FORMAT above.


http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@111
PS1, Line 111: @Override
 :   public TIcebergCatalog getIcebergCatalog() {
 : return icebergCatalog_;
 :   }
FeIcebergTable has 2 implementations, this class and IcebergTable. Both have 
the same implementation of getIcebergCatalog(). Please consider move the 
implementation of this function into FeIcebergTable.


http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@183
PS1, Line 183: getIcebergCatalog
If getIcebergCatalog(msTable) was moved to IcebergUtil then we could use it 
here.


http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java:

http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@a93
PS1, Line 93:
As I see this patch also contains some modifications related to Iceberg table 
location. Could you a) move it to a separate patch b) mention these changes in 
the commit message?


http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@54
PS1, Line 54: String catalog
Using TIcebergCatalog as a param would be easier to understand what is the 
intention of this param.


http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@63
PS1, Line 63: } else {
Would worth a Preconditions check in the else branch that icebergCatalog == 
HADOOP_TABLES



--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 1
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 14 Sep 2020 16:16:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7173/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 1
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 14 Sep 2020 13:47:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table

2020-09-14 Thread wangsheng (Code Review)
wangsheng has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16446


Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
..

IMPALA-10164: Supporting HadoopCatalog for Iceberg table

This patch mainly realizes creating Iceberg table by HadoopCatalog.
We only supported HadoopTables api before this patch, but now we can
use HadoopCatalog to create Iceberg table like this:
  CREATE EXTERNAL TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
  )
  STORED AS ICEBERG
  LOCATION 'hdfs://xxx'
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog');
If you don't specify this property in your SQL, default catalog type is
'hadoop.tables'.

Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
9 files changed, 158 insertions(+), 33 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/16446/1
--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 1
Gerrit-Owner: wangsheng