[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. IMPALA-10164: Supporting HadoopCatalog for Iceberg table This patch mainly realizes creating Iceberg table by HadoopCatalog. We only supported HadoopTables api before this patch, but now we can use HadoopCatalog to create Iceberg table. When creating managed table, we can use SQL like this: CREATE TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test'); We supported two values ('hadoop.catalog', 'hadoop.tables') for 'iceberg.catalog' now. If you don't specify this property in your SQL, default catalog type is 'hadoop.catalog'. As for external Iceberg table, you can use SQL like this: CREATE EXTERNAL TABLE default.iceberg_test_external STORED AS ICEBERG TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test', 'iceberg.table_identifier'='default.iceberg_test'); We cannot set table location for both managed and external Iceberg table with 'hadoop.catalog', and 'SHOW CREATE TABLE' will not display table location yet. We need to use 'DESCRIBE FORMATTED/EXTENDED' to get this location info. 'iceberg.catalog_location' is necessary for 'hadoop.catalog' table, which used to reserved Iceberg table metadata and data, and we use this location to load table metadata from Iceberg. 'iceberg.table_identifier' is used for Icebreg TableIdentifier.If this property not been specified in SQL, Impala will use database and table name to load Iceberg table, which is 'default.iceberg_test_external' in above SQL. This property value is splitted by '.', you can alse set this value like this: 'org.my_db.my_tbl'. And this property is valid for both managed and external table. Testing: - Create table tests in functional_schema_template.sql - Iceberg table create test in test_iceberg.py - Iceberg table query test in test_scanners.py - Iceberg table show create table test in test_show_create_table.py Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Reviewed-on: http://gerrit.cloudera.org:8080/16446 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet A
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 22: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 22 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Thu, 01 Oct 2020 13:54:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 22: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6495/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 22 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Thu, 01 Oct 2020 08:25:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 22: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 22 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Thu, 01 Oct 2020 08:25:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 21: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6490/ -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 21 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 30 Sep 2020 19:42:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 21: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 21 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 30 Sep 2020 14:10:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 20: Failed again with IMPALA-10143 so I uploaded a fix for that issue: https://gerrit.cloudera.org/#/c/16523/ -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 20 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 30 Sep 2020 14:10:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 21: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6490/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 21 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 30 Sep 2020 14:10:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 20: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6488/ -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 20 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 30 Sep 2020 13:45:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 20: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6488/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 20 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 30 Sep 2020 08:21:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 20: Code-Review+2 Those XFAIL failures can be omitted, not those the ones that make the build fail. However, there is a failing test in this job: https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/12194/ I see test_full_acid_original_files failing but that seems to be flaky anyway so I'm sure it's unrelated to this change. https://issues.apache.org/jira/browse/IMPALA-10143 Let me re-run the verify job. Carry +2 from Zoltan. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 20 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 30 Sep 2020 08:20:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 20: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7317/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 20 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 30 Sep 2020 02:17:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 20: gerrit-verify-dryrun failed, I got some message like that: === short test summary info XFAIL custom_cluster/test_alloc_fail.py::TestAllocFail::()::test_alloc_fail_update[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none] IMPALA-2925: the execution is not deterministic so some tests sometimes don't fail as expected XFAIL custom_cluster/test_process_failures.py::TestProcessFailures::()::test_restart_catalogd reason: [NOTRUN] IMPALA-9848 XFAIL authorization/test_ranger.py::TestRanger::()::test_local_catalog_ownership reason: getTableIfCached() faulty behavior, known issue === 226 passed, 116 skipped, 3 xfailed, 6 xpassed in 5690.29 seconds === It this three xfailed cases caused this job failed? These three xfailed cases seems unrelated to Iceberg. My own core test on Jenkins based on patch 18 are passed: https://jenkins.impala.io/job/pre-review-test/723/ -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 20 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 30 Sep 2020 02:02:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. IMPALA-10164: Supporting HadoopCatalog for Iceberg table This patch mainly realizes creating Iceberg table by HadoopCatalog. We only supported HadoopTables api before this patch, but now we can use HadoopCatalog to create Iceberg table. When creating managed table, we can use SQL like this: CREATE TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test'); We supported two values ('hadoop.catalog', 'hadoop.tables') for 'iceberg.catalog' now. If you don't specify this property in your SQL, default catalog type is 'hadoop.catalog'. As for external Iceberg table, you can use SQL like this: CREATE EXTERNAL TABLE default.iceberg_test_external STORED AS ICEBERG TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test', 'iceberg.table_identifier'='default.iceberg_test'); We cannot set table location for both managed and external Iceberg table with 'hadoop.catalog', and 'SHOW CREATE TABLE' will not display table location yet. We need to use 'DESCRIBE FORMATTED/EXTENDED' to get this location info. 'iceberg.catalog_location' is necessary for 'hadoop.catalog' table, which used to reserved Iceberg table metadata and data, and we use this location to load table metadata from Iceberg. 'iceberg.table_identifier' is used for Icebreg TableIdentifier.If this property not been specified in SQL, Impala will use database and table name to load Iceberg table, which is 'default.iceberg_test_external' in above SQL. This property value is splitted by '.', you can alse set this value like this: 'org.my_db.my_tbl'. And this property is valid for both managed and external table. Testing: - Create table tests in functional_schema_template.sql - Iceberg table create test in test_iceberg.py - Iceberg table query test in test_scanners.py - Iceberg table show create table test in test_show_create_table.py Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet A
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 19: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6482/ -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 19 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 29 Sep 2020 19:27:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 19: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6482/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 19 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 29 Sep 2020 13:59:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 19: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 19 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 29 Sep 2020 13:59:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 18: Code-Review+2 Thanks for applying the changes, LGTM! -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 18 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 29 Sep 2020 13:23:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 18: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7310/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 18 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 29 Sep 2020 12:53:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 18: (6 comments) Thanks for review, Zoltan. I think it's better to keep these in a patch. http://gerrit.cloudera.org:8080/#/c/16446/15//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16446/15//COMMIT_MSG@29 PS15, Line 29: table_iden > Probably 'table_identifier' would be more appropriate as it would follow Ic Done http://gerrit.cloudera.org:8080/#/c/16446/15/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: http://gerrit.cloudera.org:8080/#/c/16446/15/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@592 PS15, Line 592: handler != null && !IcebergTable.isIcebergStorageHa > Are we sure that we want to disallow this? Done http://gerrit.cloudera.org:8080/#/c/16446/15/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/16446/15/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2582 PS15, Line 2582: if (params.if_not_exists && : catalog_.containsTable(tableName.getDb(), tableName.getTbl())) > Instead of figuring out the location, we should get it from Iceberg: Done http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/datasets/functional/functional_schema_template.sql@2969 PS15, Line 2969: > nit: please start new line Done http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test: http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test@154 PS15, Line 154: > nit: please try to keep lines less than 90 chars in the QUERY section. Done http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test File testdata/workloads/functional-query/queries/QueryTest/show-create-table.test: http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test@585 PS15, Line 585: > nit: please try to keep lines less than 90 chars. Done -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 18 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 29 Sep 2020 12:34:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. IMPALA-10164: Supporting HadoopCatalog for Iceberg table This patch mainly realizes creating Iceberg table by HadoopCatalog. We only supported HadoopTables api before this patch, but now we can use HadoopCatalog to create Iceberg table. When creating managed table, we can use SQL like this: CREATE TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test'); We supported two values ('hadoop.catalog', 'hadoop.tables') for 'iceberg.catalog' now. If you don't specify this property in your SQL, default catalog type is 'hadoop.catalog'. As for external Iceberg table, you can use SQL like this: CREATE EXTERNAL TABLE default.iceberg_test_external STORED AS ICEBERG TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test', 'iceberg.table_identifier'='default.iceberg_test'); We cannot set table location for both managed and external Iceberg table with 'hadoop.catalog', and 'SHOW CREATE TABLE' will not display table location yet. We need to use 'DESCRIBE FORMATTED/EXTENDED' to get this location info. 'iceberg.catalog_location' is necessary for 'hadoop.catalog' table, which used to reserved Iceberg table metadata and data, and we use this location to load table metadata from Iceberg. 'iceberg.table_identifier' is used for Icebreg TableIdentifier.If this property not been specified in SQL, Impala will use database and table name to load Iceberg table, which is 'default.iceberg_test_external' in above SQL. This property value is splitted by '.', you can alse set this value like this: 'org.my_db.my_tbl'. And this property is valid for both managed and external table. Testing: - Create table tests in functional_schema_template.sql - Iceberg table create test in test_iceberg.py - Iceberg table query test in test_scanners.py - Iceberg table show create table test in test_show_create_table.py Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet A
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 17: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7306/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 17 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 29 Sep 2020 05:00:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 16: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7305/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 16 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 29 Sep 2020 04:56:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. IMPALA-10164: Supporting HadoopCatalog for Iceberg table This patch mainly realizes creating Iceberg table by HadoopCatalog. We only supported HadoopTables api before this patch, but now we can use HadoopCatalog to create Iceberg table. When creating managed table, we can use SQL like this: CREATE TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test'); We supported two values ('hadoop.catalog', 'hadoop.tables') for 'iceberg.catalog' now. If you don't specify this property in your SQL, default catalog type is 'hadoop.catalog'. As for external Iceberg table, you can use SQL like this: CREATE EXTERNAL TABLE default.iceberg_test_external STORED AS ICEBERG TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test', 'iceberg.table_identifier'='default.iceberg_test'); We cannot set table location for both managed and external Iceberg table with 'hadoop.catalog', and 'SHOW CREATE TABLE' will not display table location yet. We need to use 'DESCRIBE FORMATTED/EXTENDED' to get this location info. 'iceberg.catalog_location' is necessary for 'hadoop.catalog' table, which used to reserved Iceberg table metadata and data, and we use this location to load table metadata from Iceberg. 'iceberg.table_identifier' is used for Icebreg TableIdentifier.If this property not been specified in SQL, Impala will use database and table name to load Iceberg table, which is 'default.iceberg_test_external' in above SQL. This property value is splitted by '.', you can alse set this value like this: 'org.my_db.my_tbl'. And this property is valid for both managed and external table. Testing: - Create table tests in functional_schema_template.sql - Iceberg table create test in test_iceberg.py - Iceberg table query test in test_scanners.py - Iceberg table show create table test in test_show_create_table.py Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet A
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 16: (1 comment) http://gerrit.cloudera.org:8080/#/c/16446/16/fe/src/main/java/org/apache/impala/util/IcebergUtil.java File fe/src/main/java/org/apache/impala/util/IcebergUtil.java: http://gerrit.cloudera.org:8080/#/c/16446/16/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@109 PS16, Line 109: return getIcebergTableMetadata(table.getIcebergCatalog(), getIcebergTableIdentifier(table), line too long (95 > 90) -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 16 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 29 Sep 2020 04:36:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. IMPALA-10164: Supporting HadoopCatalog for Iceberg table This patch mainly realizes creating Iceberg table by HadoopCatalog. We only supported HadoopTables api before this patch, but now we can use HadoopCatalog to create Iceberg table. When creating managed table, we can use SQL like this: CREATE TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test'); We supported two values ('hadoop.catalog', 'hadoop.tables') for 'iceberg.catalog' now. If you don't specify this property in your SQL, default catalog type is 'hadoop.catalog'. As for external Iceberg table, you can use SQL like this: CREATE EXTERNAL TABLE default.iceberg_test_external STORED AS ICEBERG TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test', 'iceberg.table_identifier'='default.iceberg_test'); We cannot set table location for both managed and external Iceberg table with 'hadoop.catalog', and 'SHOW CREATE TABLE' will not display table location yet. We need to use 'DESCRIBE FORMATTED/EXTENDED' to get this location info. 'iceberg.catalog_location' is necessary for 'hadoop.catalog' table, which used to reserved Iceberg table metadata and data, and we use this location to load table metadata from Iceberg. 'iceberg.table_identifier' is used for Icebreg TableIdentifier.If this property not been specified in SQL, Impala will use database and table name to load Iceberg table, which is 'default.iceberg_test_external' in above SQL. This property value is splitted by '.', you can alse set this value like this: 'org.my_db.my_tbl'. And this property is valid for both managed and external table. Testing: - Create table tests in functional_schema_template.sql - Iceberg table create test in test_iceberg.py - Iceberg table query test in test_scanners.py - Iceberg table show create table test in test_show_create_table.py Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet A
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 15: Code-Review+1 (6 comments) Thanks for applying the changes! I think the change looks good, I only had some minor comments. http://gerrit.cloudera.org:8080/#/c/16446/15//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16446/15//COMMIT_MSG@29 PS15, Line 29: table_name Probably 'table_identifier' would be more appropriate as it would follow Iceberg's terminology. http://gerrit.cloudera.org:8080/#/c/16446/15/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: http://gerrit.cloudera.org:8080/#/c/16446/15/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@592 PS15, Line 592: // Only external table can set 'iceberg.table_name' Are we sure that we want to disallow this? I'm thinking of a use case when users use Iceberg's HadoopCatalog with more fine grained table identifiers, e.g. 'organization.database.table_name'. I think Iceberg allows arbitraly deep table identifiers. So users might also want to create some tables managed by Impala, but with the above schema, e.g.: CREATE TABLE my_hms_db.my_ice_table (i int) STORED AS ICEBERG TBLPROPERTIES( 'iceberg.catalog' = 'hadoop.catalog', 'iceberg.catalog_location' = '/warehouse/ice-catalog', 'iceberg.table_identifier' = 'my_organization.my_ice_db.my_table'); So if 'table_identifier' is missing, we should use 'db_name.tbl_name' as Iceberg table identifier, otherwise we should use the explicit param. Though I'm also happy with the current behavior if you don't want to deal with it in this patch. http://gerrit.cloudera.org:8080/#/c/16446/15/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/16446/15/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2582 PS15, Line 2582: newTable.getSd().setLocation(String.format("%s/%s/%s", location, : newTable.getDbName(), newTable.getTableName())); Instead of figuring out the location, we should get it from Iceberg: https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/Table.java#L94 So if Iceberg change its location-computing algorithm in the future, HMS will still point to the right place. http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/datasets/functional/functional_schema_template.sql@2969 PS15, Line 2969: 'iceberg.catalog_location' nit: please start new line http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test: http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test@154 PS15, Line 154: 'iceberg.catalog_location'='/$DATABASE/hadoop_catalog_test'); nit: please try to keep lines less than 90 chars in the QUERY section. http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test File testdata/workloads/functional-query/queries/QueryTest/show-create-table.test: http://gerrit.cloudera.org:8080/#/c/16446/15/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test@585 PS15, Line 585: 'iceberg.catalog_location'='$$location_uri$$') nit: please try to keep lines less than 90 chars. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 15 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Mon, 28 Sep 2020 12:30:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7294/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 15 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Mon, 28 Sep 2020 08:01:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 15: > Hi WangSheng, > > Yes, it's what I meant, I only have minor corrections, answered > inline: > > > 1. We prohibit explicitly setting the table LOCATION when using > 'hadoop.catalog' > Yes, I think we should prohibit it and let Iceberg decide the > location. > > > 2. When execute 'SHOW CREATE TABLE', table location is > 'hdfs://test-warehouse/ice-catalog/default/ice_t' > If we prohibit setting table location, then we can't output it in > SHOW CREATE TABLE. > > > 3. When execute 'DESCRIBE FORMATTED', table location is > 'hdfs://test-warehouse/ice-catalog/default/ice_t', and > 'iceberg.catalog_location' is 'hdfs://test-warehouse/ice-catalog' > Yes. > > > 4. When execute 'DROP TABLE', we only delete > > 'hdfs://test-warehouse/ice-catalog/default/ice_t', > and reserve 'hdfs://test-warehouse/ice-catalog' > Yes. > > > 5. 'iceberg.catalog_location' is necessary for both managed and > external 'hadoop.catalog' Iceberg table > Yes. Maybe later we could also have a default location somewhere > under the warehouse directory. But for now I think explicit catalog > locations are better. > > > 6. When loading managed table, we use > > load('iceberg.catalog_location',TableIdentifier(db.tbl)) > Yes. > > > 7. When loading external table, we use > > load('iceberg.catalog_location',TableIdentifier('iceberg.table_name')) > Yes, maybe rename 'iceberg.table_name' to 'iceberg.table_identifier'? > And the value of it should be a dot-separated string of names that > was originally used to create the Iceberg table. Iceberg can > already make table identifiers from dot-separated strings: > https://github.com/apache/iceberg/blob/1a797cd986c9193f2e422fec295aaf25ce7e1916/api/src/main/java/org/apache/iceberg/catalog/TableIdentifier.java#L47 > > > > > Does my understanding correct? > > > > And still some questions: > > > > 1. When creating external 'hadoop.catalog' Iceberg table, are we > load table by 'iceberg.catalog_location' and 'iceberg.table_name'? > If so, both these two properties are necessary. > So instead of 'iceberg.table_name' we could have 'iceberg.table_identifier', > just to use Iceberg terminology and semantics. Yeah, probably we > want it to be a required property for EXTERNAL hadoop catalog > tables. I just wonder if it would make sense to infer it from > 'db_name.tbl_name'. But let's start with it as a required property, > then later we'll se if inferring it makes sense and add support for > it in a future patch. > > > 2. Can we set the table LOCATION for 'hadoop.catalog' external > table? If not, how do we define table LOCATION? Using > 'iceberg.catalog_location'+/db/external_tbl? Or use > 'iceberg.catalog_location'+'iceberg.table_name'? > I think we should not set LOCATION. Iceberg uses catalog location + > table identifier to find and load tables, so I think we should > provide those: > 'iceberg.catalog_location'+'iceberg.table_identifier' > And if Iceberg can load the table with the given information, we > can set the actual table location. If the table is not found, we > should raise an error. Hi Zoltan, thanks for your careful review and patient reply. I've already modify code according to your advice. A Jenkins test has already been submitted: https://jenkins.impala.io/job/pre-review-test/721/ -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 15 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Mon, 28 Sep 2020 07:45:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. IMPALA-10164: Supporting HadoopCatalog for Iceberg table This patch mainly realizes creating Iceberg table by HadoopCatalog. We only supported HadoopTables api before this patch, but now we can use HadoopCatalog to create Iceberg table. When creating managed table, we can use SQL like this: CREATE TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test'); We supported two values ('hadoop.catalog', 'hadoop.tables') for 'iceberg.catalog' now. If you don't specify this property in your SQL, default catalog type is 'hadoop.catalog'. As for external Iceberg table, you can use SQL like this: CREATE EXTERNAL TABLE default.iceberg_test_external STORED AS ICEBERG TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test', 'iceberg.table_name'='default.iceberg_test'); We cannot set table location for both managed and external Iceberg table with 'hadoop.catalog', and 'SHOW CREATE TABLE' will not display table location yet. We need to use 'DESCRIBE FORMATTED/EXTENDED' to get this location info. 'iceberg.catalog_location' is necessary for 'hadoop.catalog' table, which used to reserved Iceberg table metadata and data, and we use this location to load table metadata from Iceberg. 'iceberg.table_name' is the managed Iceberg table name, just like 'kudu.table_name' when creating external Kudu table. If this property not been specified in SQL, Impala will use database and table name to load Iceberg table, which is 'default.iceberg_test_external' in above SQL. This property cannot be set with managed table. Testing: - Create table tests in functional_schema_template.sql - Iceberg table create test in test_iceberg.py - Iceberg table query test in test_scanners.py - Iceberg table show create table test in test_show_create_table.py Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet A
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7288/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 14 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Sun, 27 Sep 2020 13:50:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. IMPALA-10164: Supporting HadoopCatalog for Iceberg table This patch mainly realizes creating Iceberg table by HadoopCatalog. We only supported HadoopTables api before this patch, but now we can use HadoopCatalog to create Iceberg table. When creating managed table, we can use SQL like this: CREATE TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test'); We supported two values ('hadoop.catalog', 'hadoop.tables') for 'iceberg.catalog' now. If you don't specify this property in your SQL, default catalog type is 'hadoop.catalog'. As for external Iceberg table, you can use SQL like this: CREATE EXTERNAL TABLE default.iceberg_test_external STORED AS ICEBERG TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test', 'iceberg.table_name'='default.iceberg_test'); We cannot set table location for both managed and external Iceberg table with 'hadoop.catalog', and 'SHOW CREATE TABLE' will not display table location yet. We need to use 'DESCRIBE FORMATTED/EXTENDED' to get this location info. 'iceberg.catalog_location' is necessary for 'hadoop.catalog' table, which used to reserved Iceberg table metadata and data, and we use this location to load table metadata from Iceberg. 'iceberg.table_name' is the managed Iceberg table name, just like 'kudu.table_name' when creating external Kudu table. If this property not been specified in SQL, Impala will use database and table name to load Iceberg table, which is 'default.iceberg_test_external' in above SQL. This property cannot be set with managed table. Testing: - Create table tests in functional_schema_template.sql - Iceberg table create test in test_iceberg.py - Iceberg table query test in test_scanners.py - Iceberg table show create table test in test_show_create_table.py Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet A
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 13: Hi WangSheng, Yes, it's what I meant, I only have minor corrections, answered inline: > 1. We prohibit explicitly setting the table LOCATION when using > 'hadoop.catalog' Yes, I think we should prohibit it and let Iceberg decide the location. > 2. When execute 'SHOW CREATE TABLE', table location is > 'hdfs://test-warehouse/ice-catalog/default/ice_t' If we prohibit setting table location, then we can't output it in SHOW CREATE TABLE. > 3. When execute 'DESCRIBE FORMATTED', table location is > 'hdfs://test-warehouse/ice-catalog/default/ice_t', and > 'iceberg.catalog_location' is 'hdfs://test-warehouse/ice-catalog' Yes. > 4. When execute 'DROP TABLE', we only delete > 'hdfs://test-warehouse/ice-catalog/default/ice_t', and reserve > 'hdfs://test-warehouse/ice-catalog' Yes. > 5. 'iceberg.catalog_location' is necessary for both managed and external > 'hadoop.catalog' Iceberg table Yes. Maybe later we could also have a default location somewhere under the warehouse directory. But for now I think explicit catalog locations are better. > 6. When loading managed table, we use > load('iceberg.catalog_location',TableIdentifier(db.tbl)) Yes. > 7. When loading external table, we use > load('iceberg.catalog_location',TableIdentifier('iceberg.table_name')) Yes, maybe rename 'iceberg.table_name' to 'iceberg.table_identifier'? And the value of it should be a dot-separated string of names that was originally used to create the Iceberg table. Iceberg can already make table identifiers from dot-separated strings: https://github.com/apache/iceberg/blob/1a797cd986c9193f2e422fec295aaf25ce7e1916/api/src/main/java/org/apache/iceberg/catalog/TableIdentifier.java#L47 > > Does my understanding correct? > > And still some questions: > > 1. When creating external 'hadoop.catalog' Iceberg table, are we load table > by 'iceberg.catalog_location' and 'iceberg.table_name'? If so, both these two > properties are necessary. So instead of 'iceberg.table_name' we could have 'iceberg.table_identifier', just to use Iceberg terminology and semantics. Yeah, probably we want it to be a required property for EXTERNAL hadoop catalog tables. I just wonder if it would make sense to infer it from 'db_name.tbl_name'. But let's start with it as a required property, then later we'll se if inferring it makes sense and add support for it in a future patch. > 2. Can we set the table LOCATION for 'hadoop.catalog' external table? If not, > how do we define table LOCATION? Using > 'iceberg.catalog_location'+/db/external_tbl? Or use > 'iceberg.catalog_location'+'iceberg.table_name'? I think we should not set LOCATION. Iceberg uses catalog location + table identifier to find and load tables, so I think we should provide those: 'iceberg.catalog_location'+'iceberg.table_identifier' And if Iceberg can load the table with the given information, we can set the actual table location. If the table is not found, we should raise an error. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 13 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 25 Sep 2020 16:30:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 13: > Hi WangSheng, thank you for your reply. > > I think we should choose the option that causes the least > confusions. > With the current solution the users can create two tables in the > following way: > > CREATE TABLE ice_1 (i int) STORED AS ICEBERG LOCATION > 'hdfs://test-warehouse/ice-catalog'; > CREATE TABLE ice_2 (s string) STORED AS ICEBERG LOCATION > 'hdfs://test-warehouse/ice-catalog'; > > ice_1 and ice_2 will be created in the same hadoop catalog, but > they can contain their own data under their own (implicit) table > location. > I think it's a valid use case as iceberg supports creating multiple > tables in the same catalog. Moreover, it's probably how Iceberg > catalogs are meant to be used. > Now if the user DROPs one of them, HMS will remove the whole > catalog, possibly causing an unintended data loss. > > Note that this case is different than having two managed PARQUET > tables based on same location, because in that case the tables > point to the same data. Also I cannot think of a use case when > users should do that. > > I propose the following: we should introduce a new table property: > 'iceberg.catalog_location'. > > So users would create tables with the following statement if they > want to use hadoop catalog: > > CREATE TABLE ice_t (i int) > STORED AS ICEBERG > TBLPROPERTIES('iceberg.catalog'='hadoop_catalog', > 'iceberg.catalog_location'='hdfs://test-warehouse/ice-catalog'); > > In that case it would be quite explicit what's happening. And we'd > set the table's location to what Iceberg computes for the table > (from + ). > We probably even want to prohibit explicitly setting the table > LOCATION (so SHOW CREATE TABLE shouldn't include it either) when > using hadoop catalog. > > So DROP TABLE wouldn't affect other tables, and DESCRIBE FORMATTED > would automatically show the actual table LOCATION and > 'iceberg.catalog_location' (since it's a table property). > If we DROP all the tables in an iceberg catalog, then the empty > catalog directory will still remain, but I don't see that as a > serious issue. > > Tables created via HadoopTables are not affected, i.e. they > continue to work like they already work. > > What do you think about this approach? Hi Zoltan, 'iceberg.catalog_location' is indeed a good approach, and here is some of my understanding for this property. For this table: CREATE TABLE default.ice_t (i int) STORED AS ICEBERG TBLPROPERTIES('iceberg.catalog'='hadoop_catalog', 'iceberg.catalog_location'='hdfs://test-warehouse/ice-catalog'); 1. We prohibit explicitly setting the table LOCATION when using 'hadoop.catalog' 2. When execute 'SHOW CREATE TABLE', table location is 'hdfs://test-warehouse/ice-catalog/default/ice_t' 3. When execute 'DESCRIBE FORMATTED', table location is 'hdfs://test-warehouse/ice-catalog/default/ice_t', and 'iceberg.catalog_location' is 'hdfs://test-warehouse/ice-catalog' 4. When execute 'DROP TABLE', we only delete 'hdfs://test-warehouse/ice-catalog/default/ice_t', and reserve 'hdfs://test-warehouse/ice-catalog' 5. 'iceberg.catalog_location' is necessary for both managed and external 'hadoop.catalog' Iceberg table 6. When loading managed table, we use load('iceberg.catalog_location',TableIdentifier(db.tbl)) 7. When loading external table, we use load('iceberg.catalog_location',TableIdentifier('iceberg.table_name')) Does my understanding correct? And still some questions: 1. When creating external 'hadoop.catalog' Iceberg table, are we load table by 'iceberg.catalog_location' and 'iceberg.table_name'? If so, both these two properties are necessary. 2. Can we set the table LOCATION for 'hadoop.catalog' external table? If not, how do we define table LOCATION? Using 'iceberg.catalog_location'+/db/external_tbl? Or use 'iceberg.catalog_location'+'iceberg.table_name'? -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 13 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 25 Sep 2020 14:25:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 13: Hi WangSheng, thank you for your reply. I think we should choose the option that causes the least confusions. With the current solution the users can create two tables in the following way: CREATE TABLE ice_1 (i int) STORED AS ICEBERG LOCATION 'hdfs://test-warehouse/ice-catalog'; CREATE TABLE ice_2 (s string) STORED AS ICEBERG LOCATION 'hdfs://test-warehouse/ice-catalog'; ice_1 and ice_2 will be created in the same hadoop catalog, but they can contain their own data under their own (implicit) table location. I think it's a valid use case as iceberg supports creating multiple tables in the same catalog. Moreover, it's probably how Iceberg catalogs are meant to be used. Now if the user DROPs one of them, HMS will remove the whole catalog, possibly causing an unintended data loss. Note that this case is different than having two managed PARQUET tables based on same location, because in that case the tables point to the same data. Also I cannot think of a use case when users should do that. I propose the following: we should introduce a new table property: 'iceberg.catalog_location'. So users would create tables with the following statement if they want to use hadoop catalog: CREATE TABLE ice_t (i int) STORED AS ICEBERG TBLPROPERTIES('iceberg.catalog'='hadoop_catalog', 'iceberg.catalog_location'='hdfs://test-warehouse/ice-catalog'); In that case it would be quite explicit what's happening. And we'd set the table's location to what Iceberg computes for the table (from + ). We probably even want to prohibit explicitly setting the table LOCATION (so SHOW CREATE TABLE shouldn't include it either) when using hadoop catalog. So DROP TABLE wouldn't affect other tables, and DESCRIBE FORMATTED would automatically show the actual table LOCATION and 'iceberg.catalog_location' (since it's a table property). If we DROP all the tables in an iceberg catalog, then the empty catalog directory will still remain, but I don't see that as a serious issue. Tables created via HadoopTables are not affected, i.e. they continue to work like they already work. What do you think about this approach? -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 13 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 25 Sep 2020 10:04:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 13: > (2 comments) Hi Zoltan, I've already read your reply closely, it seems we have some different understanding, here is some of my opinions for this patch: 1. We use location in SQL as table root path, like '/test-warehouse/iceberg_test/hadoop_catalog/hadoop_catalog_test', regardless of the structure under this location. If we use 'hadoop.catalog', the structure like this: /test-warehouse/iceberg_test/hadoop_catalog/my_db/my_table/metadata/xxx /test-warehouse/iceberg_test/hadoop_catalog/my_db/my_table/data/xxx And if we use 'hadoop.tables', the structure like this: /test-warehouse/iceberg_test/hadoop_catalog/metadata/xxx /test-warehouse/iceberg_test/hadoop_catalog/data/xxx In this situation, whether creating managed or external Iceberg table based on 'hadoop.catalog'/'hadoop.tables', we just need to provide a location '/test-warehouse/iceberg_test/hadoop_catalog'. Even if you don't provide a location in SQL when creating managed Iceberg table, we will also use '$DEFAULT_WAREHOUSE/my_table' as table root path. I think this keep the consist of 'hadoop.catalog' and 'hadoop.tables'. So we just need to remember a root table path. 2. Based on above situation, when creating two managed table on same location based on 'hadoop.catalog', drop one of the table, the location will be deleted by HMS. And I think this keep the consist of HdfsTable and IcebergTable. For example, when creating two managed PARQUET tables based on same location, drop one of the table, the whole location will also be deleted by HMS. Based on above opinions, here is some of my questions: 1. If using HadoopCatalog.dropTable in code, the root path 'test-warehouse/iceberg_test/hadoop_catalog' will be reserved, just deleted '/my_db/my_table', this is different from HadoopTables or normal HdfsTable, which DROP TABLE will delete whole location, does this make users feel confused? 2. DESCRIBE FORMATTED shows the actual table location '/test-warehouse/iceberg_test/hadoop_catalog/my_db/my_table', but SHOW CREATE TABLE shows sql location '/test-warehouse/iceberg_test/hadoop_catalog', I‘m not sure if this is appropriate for two queries return different location on same table. But same location for 'hadoop.tables'. If you think the above two modifications are indeed better, I will adjust code in current patch as soon as possible. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 13 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 25 Sep 2020 02:39:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 13: > (2 comments) Hi Zoltan, I've already read your reply closely, it seems we have some different understanding, here is some of my opinions for this patch: 1. We use location in SQL as table root path, like '/test-warehouse/iceberg_test/hadoop_catalog/hadoop_catalog_test', regardless of the structure under this location. If we use 'hadoop.catalog', the structure like this: /test-warehouse/iceberg_test/hadoop_catalog/my_db/my_table/metadata/xxx /test-warehouse/iceberg_test/hadoop_catalog/my_db/my_table/data/xxx And if we use 'hadoop.tables', the structure like this: /test-warehouse/iceberg_test/hadoop_catalog/metadata/xxx /test-warehouse/iceberg_test/hadoop_catalog/data/xxx In this situation, whether creating managed or external Iceberg table based on 'hadoop.catalog'/'hadoop.tables', we just need to provide a location '/test-warehouse/iceberg_test/hadoop_catalog'. Even if you don't provide a location in SQL when creating managed Iceberg table, we will also use '$DEFAULT_WAREHOUSE/my_table' as table root path. I think this keep the consist of 'hadoop.catalog' and 'hadoop.tables'. So we just need to remember a root table path. 2. Based on above situation, when creating two managed table on same location based on 'hadoop.catalog', drop one of the table, the location will be deleted by HMS. And I think this keep the consist of HdfsTable and IcebergTable. For example, when creating two managed PARQUET tables based on same location, drop one of the table, the whole location will also be deleted by HMS. Based on above opinions, here is some of my questions: 1. If using HadoopCatalog.dropTable in code, the root path 'test-warehouse/iceberg_test/hadoop_catalog' will be reserved, just deleted '/my_db/my_table', this is different from HadoopTables or normal HdfsTable, which DROP TABLE will delete whole location, does this make users feel confused? 2. DESCRIBE FORMATTED shows the actual table location '/test-warehouse/iceberg_test/hadoop_catalog/my_db/my_table', but SHOW CREATE TABLE shows sql location '/test-warehouse/iceberg_test/hadoop_catalog', I‘m not sure if this is appropriate for two queries return different location on same table. But same location for 'hadoop.tables'. If you think the above two modifications are indeed better, I will adjust code in current patch as soon as possible. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 13 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Fri, 25 Sep 2020 02:38:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 13: (2 comments) http://gerrit.cloudera.org:8080/#/c/16446/13/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test: http://gerrit.cloudera.org:8080/#/c/16446/13/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@80 PS13, Line 80: row_regex:.*CAUSED BY: NoSuchTableException: Table does not exist: $DATABASE.iceberg_test7* Since the test only drops 'iceberg_test6' I think the expected behavior is that 'iceberg_test7' still exists. I'm OK with doing this in a separate patch and Jira, but in that case we should prohibit dropping Iceberg tables in this patch because this behavior is unexpected and can cause potential data loss. We should probably use the dropTable (https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java#L183) method to drop the table from the Iceberg catalog, then drop the table from HMS, e.g. like here: https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L1918-L1920 But for IcebergTables we need to set the param 'purge' to false (Because the data files are already been purged by Iceberg. Also, the location parameter might points to the catalog location instead of the table location, and we don't want to remove the whole catalog). So I think in the end you might want to add a dropTable() method to IcebergCatalogOpExecutor that drops Iceberg tables properly, both from Iceberg and HMS. http://gerrit.cloudera.org:8080/#/c/16446/13/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test: http://gerrit.cloudera.org:8080/#/c/16446/13/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@305 PS13, Line 305: 'Location: ','$NAMENODE/test-warehouse/iceberg_test/hadoop_catalog/hadoop_catalog_test','NULL' Can we show the actual table location (https://github.com/apache/iceberg/blob/ada047be14205a88396e0ddf8b0650a9f44c/api/src/main/java/org/apache/iceberg/Table.java#L94), not the HadoopCatalog location here? So I think in the end we want the following behavior: * DROP TABLE only drops the referred table. If the table is truly external (so it doesn't have 'external.table.purge' property), we only drop the table from HMS, keeping the data files. * SHOW CREATE TABLE shows the CREATE TABLE statement that created the table, so for location it shows the hadoop catalog location * DESCRIBE FORMATTED shows the actual table location -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 13 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Thu, 24 Sep 2020 17:24:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 13: > Yeah, I just checked and there is no DESCRIBE EXTENDED test, > however there are DESCRIBE FORMATTED tests which is basically the > same thing. > > You can use > RESULTS: VERIFY_IS_SUBSET Already add two kinds of test case. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 13 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Thu, 24 Sep 2020 12:41:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7270/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 13 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Thu, 24 Sep 2020 12:25:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. IMPALA-10164: Supporting HadoopCatalog for Iceberg table This patch mainly realizes creating Iceberg table by HadoopCatalog. We only supported HadoopTables api before this patch, but now we can use HadoopCatalog to create Iceberg table. When creating managed table, we can use SQL like this: CREATE TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG LOCATION 'hdfs://test-warehouse/hadoop_catalog_test' TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog'); We supported two values ('hadoop.catalog', 'hadoop.tables') for 'iceberg.catalog' now. If you don't specify this property in your SQL, default catalog type is 'hadoop.catalog'. As for external Iceberg table, you can use SQL like this: CREATE EXTERNAL TABLE default.iceberg_test_external STORED AS ICEBERG LOCATION 'hdfs://test-warehouse/hadoop_catalog_test' TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.table_name'='default.iceberg_test'); 'iceberg.table_name' is the managed Iceberg table name, just like 'kudu.table_name' when creating external Kudu table. If this property not been specified in SQL, Impala will use database and table name to load Iceberg table, which is 'default.iceberg_test_external' in above SQL. This property cannot be set with managed table. Testing: - Create table tests in functional_schema_template.sql - Iceberg table create test in test_iceberg.py - Iceberg table query test in test_scanners.py Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-f698d7a4-245f-44d5-8a59-ed511854c8f8-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-7c1d5490-91f7-47bd-a3b6-e86caa7fe47d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-d2ef5fcf-4346-421f-b2ef-1f9d55fb4c84-0.parquet A
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 12: Yeah, I just checked and there is no DESCRIBE EXTENDED test, however there are DESCRIBE FORMATTED tests which is basically the same thing. You can use RESULTS: VERIFY_IS_SUBSET -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 12 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Thu, 24 Sep 2020 09:06:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 12: > > Thanks for fixing the test! > > Currently I'm wondering about DROP TABLE. Could you please add a > > test that creates two Iceberg tables in the same HadoopCatalog, > > DROP one of them and check that the other one still exists. > > Could you please also add a DESCRIBE EXTENDED test to check the > > TBLPROPERTIES and LOCATION? > > Hi Zoltan, I don't understand about "creates two Iceberg tables in > the same HadoopCatalog". Do you mean create two managed table with > HadoopCatalog in different location? Or create two external table > with HadoopCatalog in same location? Or create a managed table, and > create a external location to this managed table? Thanks for your patient explain, Zoltan, and I understand. Of course your understanding is totally correct! I will add this test case lately. Another question is for 'DESCRIBE EXTENDED table', the result will returns lots of information, how do I define ' RESULTS' in test file? Use 'row_regex' to filter some info or list all values like this: '# col_name','data_type','comment' '','NULL','NULL' 'id','int','NULL' 'user','string','NULL' ... I didn't found any case of 'DESCRIBE EXTENDED table' in 'testdata/workloads/functional-query/queries/QueryTest', only found 'describe database extended xxx', which is very different from 'DESCRIBE EXTENDED xxx': RESULTS '','$USER','USER' 'Owner: ','','' 'impala_test_desc_db1','$NAMENODE/$EXTERNAL_WAREHOUSE_DIR/impala_test_desc_db1.db','' -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 12 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Thu, 24 Sep 2020 09:00:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 12: I was thinking about two managed Iceberg tables using the same hadoop catalog location. If I understand correctly, in CREATE TABLE we can provide the location of the HadoopCatalog, i.e. a directory in HDFS. Then the actual table location is derived from the HadoopCatalog location + DB_NAME + TABLE_NAME. So in the end the Iceberg table will be located at some directory like: hdfs://test-warehouse/iceberg-hadoop-catalog/my_database/my_table And if we create a second managed table with the same HadoopCatalog location, it will be placed at: hdfs://test-warehouse/iceberg-hadoop-catalog/my_database_2/my_table_2 But in the Hive Metastore, for table location we only store hdfs://test-warehouse/iceberg-hadoop-catalog/, right? So if my understanding is correct, on DROP TABLE we might delete the whole hdfs://test-warehouse/iceberg-hadoop-catalog/ directory. Or maybe I misunderstood something? But adding such test is useful anyway in my opinion. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 12 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Thu, 24 Sep 2020 08:37:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 12: > Thanks for fixing the test! > Currently I'm wondering about DROP TABLE. Could you please add a > test that creates two Iceberg tables in the same HadoopCatalog, > DROP one of them and check that the other one still exists. > Could you please also add a DESCRIBE EXTENDED test to check the > TBLPROPERTIES and LOCATION? Hi Zoltan, I don't understand about "creates two Iceberg tables in the same HadoopCatalog". Do you mean create two managed table with HadoopCatalog in different location? Or create two external table with HadoopCatalog in same location? Or create a managed table, and create a external location to this managed table? -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 12 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Thu, 24 Sep 2020 08:17:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 12: Thanks for fixing the test! Currently I'm wondering about DROP TABLE. Could you please add a test that creates two Iceberg tables in the same HadoopCatalog, DROP one of them and check that the other one still exists. Could you please also add a DESCRIBE EXTENDED test to check the TBLPROPERTIES and LOCATION? -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 12 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Thu, 24 Sep 2020 08:07:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 12: Hi Zoltan, already modify test case which failed during local catalog mode, but Jenkins failed based on patch 12: https://jenkins.impala.io/job/pre-review-test/713/ Seems also related to IMPALA-9923. The failed log is load-tpcds-core-hive-generated-orc-def-block.sql.log, error log is 'Error: Error while compiling statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException: Fail to get checksum, since file /test-warehouse/managed/tpcds.store_sales_orc_def/ss_sold_date_sk=2452612/base_005/_orc_acid_version is under construction. (state=08S01,code=1) java.sql.SQLException: Error while compiling statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException: Fail to get checksum, since file /test-warehouse/managed/tpcds.store_sales_orc_def/ss_sold_date_sk=2452612/base_005/_orc_acid_version is under construction.' -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 12 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Thu, 24 Sep 2020 02:26:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 10: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6467/ -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 10 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 23 Sep 2020 12:49:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7248/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 11 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 23 Sep 2020 08:26:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. IMPALA-10164: Supporting HadoopCatalog for Iceberg table This patch mainly realizes creating Iceberg table by HadoopCatalog. We only supported HadoopTables api before this patch, but now we can use HadoopCatalog to create Iceberg table. When creating managed table, we can use SQL like this: CREATE TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG LOCATION 'hdfs://test-warehouse/hadoop_catalog_test' TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog'); We supported two values ('hadoop.catalog', 'hadoop.tables') for 'iceberg.catalog' now. If you don't specify this property in your SQL, default catalog type is 'hadoop.catalog'. As for external Iceberg table, you can use SQL like this: CREATE EXTERNAL TABLE default.iceberg_test_external STORED AS ICEBERG LOCATION 'hdfs://test-warehouse/hadoop_catalog_test' TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.table_name'='default.iceberg_test'); 'iceberg.table_name' is the managed Iceberg table name, just like 'kudu.table_name' when creating external Kudu table. If this property not been specified in SQL, Impala will use database and table name to load Iceberg table, which is 'default.iceberg_test_external' in above SQL. This property cannot be set with managed table. Testing: - Create table tests in functional_schema_template.sql - Iceberg table create test in test_iceberg.py - Iceberg table query test in test_scanners.py Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-f698d7a4-245f-44d5-8a59-ed511854c8f8-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-7c1d5490-91f7-47bd-a3b6-e86caa7fe47d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-d2ef5fcf-4346-421f-b2ef-1f9d55fb4c84-0.parquet A
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 10: > The data loading failure was probably due to IMPALA-9923 so I > restarted the verify job. > However, later I've found that there were some failing Iceberg > tests in the dockerised environment, so it's probably reproducible > in local catalog mode: > > bin/start-impala-cluster.py --impalad_args --enable_minidumps=false > --impalad_args --use_local_catalog=true --catalogd_args > --catalog_topic_mode=minimal > > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3199/testReport/ ok, I will test local catalog mode in my own environment with these new test cases. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 10 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 23 Sep 2020 07:37:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 9: The data loading failure was probably due to IMPALA-9923 so I restarted the verify job. However, later I've found that there were some failing Iceberg tests in the dockerised environment, so it's probably reproducible in local catalog mode: bin/start-impala-cluster.py --impalad_args --enable_minidumps=false --impalad_args --use_local_catalog=true --catalogd_args --catalog_topic_mode=minimal https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3199/testReport/ -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 9 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 23 Sep 2020 07:34:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 10: I see some Iceberg test related failures as well. Can these be related to this change? https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3199/ -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 10 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 23 Sep 2020 07:29:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 10: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 10 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 23 Sep 2020 07:27:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 10: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6467/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 10 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 23 Sep 2020 07:27:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 9: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6461/ -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 9 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 22 Sep 2020 18:52:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6461/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 9 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 22 Sep 2020 14:47:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 9 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 22 Sep 2020 14:47:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 8: Code-Review+2 (2 comments) Thx for your changes! I'm fine with this patch. Don't forget to open a Jira for rewriting iceber_file_format, please. (Added a comment about this) http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java: http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@69 PS3, Line 69: iceberg_file_format > I don't have a strong opinion on that, whichever works best for you. I see this was postponed to a later commit. I'm fine with that but could you open a Jira that we can keep track of? (please add impala-icegerg label) http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test: http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@191 PS2, Line 191: SELECT count(*) from hadoop_catalog_test_external > This test files are generated by spark-shell, if you want to test non-exter I see. Let's wait till the write support then. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 8 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 22 Sep 2020 14:47:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 8: Code-Review+1 Thanks for applying the changes, LTGM! -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 8 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 22 Sep 2020 13:26:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7235/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 8 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 22 Sep 2020 12:11:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. IMPALA-10164: Supporting HadoopCatalog for Iceberg table This patch mainly realizes creating Iceberg table by HadoopCatalog. We only supported HadoopTables api before this patch, but now we can use HadoopCatalog to create Iceberg table. When creating managed table, we can use SQL like this: CREATE TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG LOCATION 'hdfs://test-warehouse/hadoop_catalog_test' TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog'); We supported two values ('hadoop.catalog', 'hadoop.tables') for 'iceberg.catalog' now. If you don't specify this property in your SQL, default catalog type is 'hadoop.catalog'. As for external Iceberg table, you can use SQL like this: CREATE EXTERNAL TABLE default.iceberg_test_external STORED AS ICEBERG LOCATION 'hdfs://test-warehouse/hadoop_catalog_test' TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.table_name'='default.iceberg_test'); 'iceberg.table_name' is the managed Iceberg table name, just like 'kudu.table_name' when creating external Kudu table. If this property not been specified in SQL, Impala will use database and table name to load Iceberg table, which is 'default.iceberg_test_external' in above SQL. This property cannot be set with managed table. Testing: - Create table tests in functional_schema_template.sql - Iceberg table create test in test_iceberg.py - Iceberg table query test in test_scanners.py Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-f698d7a4-245f-44d5-8a59-ed511854c8f8-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-7c1d5490-91f7-47bd-a3b6-e86caa7fe47d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-d2ef5fcf-4346-421f-b2ef-1f9d55fb4c84-0.parquet A
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7231/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 7 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 22 Sep 2020 02:04:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. IMPALA-10164: Supporting HadoopCatalog for Iceberg table This patch mainly realizes creating Iceberg table by HadoopCatalog. We only supported HadoopTables api before this patch, but now we can use HadoopCatalog to create Iceberg table. When creating managed table, we can use SQL like this: CREATE TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG LOCATION 'hdfs://test-warehouse/hadoop_catalog_test' TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog'); We supported two values ('hadoop.catalog', 'hadoop.tables') for 'iceberg.catalog' now. If you don't specify this property in your SQL, default catalog type is 'hadoop.catalog'. As for external Iceberg table, you can use SQL like this: CREATE EXTERNAL TABLE default.iceberg_test_external STORED AS ICEBERG LOCATION 'hdfs://test-warehouse/hadoop_catalog_test' TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.table_name'='default.iceberg_test'); 'iceberg.table_name' is the managed Iceberg table name, just like 'kudu.table_name' when creating external Kudu table. If this property not been specified in SQL, Impala will use database and table name to load Iceberg table, which is 'default.iceberg_test_external' in above SQL. This property cannot be set with managed table. Testing: - Create table tests in functional_schema_template.sql - Iceberg table create test in test_iceberg.py - Iceberg table query test in test_scanners.py Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-f698d7a4-245f-44d5-8a59-ed511854c8f8-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-7c1d5490-91f7-47bd-a3b6-e86caa7fe47d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-d2ef5fcf-4346-421f-b2ef-1f9d55fb4c84-0.parquet A
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7219/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 6 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Mon, 21 Sep 2020 14:58:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Hello Gabor Kaszab, Zoltan Borok-Nagy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16446 to look at the new patch set (#6). Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. IMPALA-10164: Supporting HadoopCatalog for Iceberg table This patch mainly realizes creating Iceberg table by HadoopCatalog. We only supported HadoopTables api before this patch, but now we can use HadoopCatalog to create Iceberg table. When creating managed table, we can use SQL like this: CREATE TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG LOCATION 'hdfs://test-warehouse/hadoop_catalog_test' TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog'); We supported two values ('hadoop.catalog', 'hadoop.tables') for 'iceberg.catalog' now. If you don't specify this property in your SQL, default catalog type is 'hadoop.catalog'. As for external Iceberg table, you can use SQL like this: CREATE EXTERNAL TABLE default.iceberg_test_external STORED AS ICEBERG LOCATION 'hdfs://test-warehouse/hadoop_catalog_test' TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.table_name'='default.iceberg_test'); 'iceberg.table_name' is the managed Iceberg table name, just like 'kudu.table_name' when creating external Kudu table. If this property not been specified in SQL, Impala will use database and table name to load Iceberg table, which is 'default.iceberg_test_external' in above SQL. This property cannot be set with managed table. Testing: - Create table tests in functional_schema_template.sql - Iceberg table create test in test_iceberg.py - Iceberg table query test in test_scanners.py Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-f698d7a4-245f-44d5-8a59-ed511854c8f8-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-7c1d5490-91f7-47bd-a3b6-e86caa7fe47d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-d2ef5fcf-4346-421f-b2ef-1f9d55fb4c84-0.parquet A
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7218/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 5 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Mon, 21 Sep 2020 14:17:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. IMPALA-10164: Supporting HadoopCatalog for Iceberg table This patch mainly realizes creating Iceberg table by HadoopCatalog. We only supported HadoopTables api before this patch, but now we can use HadoopCatalog to create Iceberg table. When creating managed table, we can use SQL like this: CREATE TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG LOCATION 'hdfs://test-warehouse/hadoop_catalog_test' TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog'); We supported two values ('hadoop.catalog', 'hadoop.tables') for 'iceberg.catalog' now. If you don't specify this property in your SQL, default catalog type is 'hadoop.catalog'. As for external Iceberg table, you can use SQL like this: CREATE EXTERNAL TABLE default.iceberg_test_external STORED AS ICEBERG LOCATION 'hdfs://test-warehouse/hadoop_catalog_test' TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.table_name'='default.iceberg_test'); 'iceberg.table_name' is the managed Iceberg table name, just like 'kudu.table_name' when creating external Kudu table. If this property not been specified in SQL, Impala will use database and table name to load Iceberg table, which is 'default.iceberg_test_external' in above SQL. This property cannot be set with managed table. Testing: - Create table tests in functional_schema_template.sql - Iceberg table create test in test_iceberg.py - Iceberg table query test in test_scanners.py Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-f698d7a4-245f-44d5-8a59-ed511854c8f8-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-7c1d5490-91f7-47bd-a3b6-e86caa7fe47d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-d2ef5fcf-4346-421f-b2ef-1f9d55fb4c84-0.parquet A
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 4: Code-Review+1 (2 comments) Thanks for applying the changes! http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java: http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@69 PS3, Line 69: iceberg_file_format > Shall we change this property name in this patch? Maybe in another patch is I don't have a strong opinion on that, whichever works best for you. http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@78 PS3, Line 78: // Each table id increase from zero : HadoopCatalog catalo > Done Maybe these functions should be private as well. I guess we only want them to be called from the public createTable() at L54. Probably we should also rename them to createTableByHadoopTables() and createTableByHadoopCatalog(). -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 4 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Mon, 21 Sep 2020 13:26:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7217/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 4 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Mon, 21 Sep 2020 13:18:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 4: (7 comments) I've already rename two new properties to: 'iceberg.catalog' and 'iceberg.table_name', I'm not sure if this is appropriate to rename 'iceberg_file_format' to 'iceberg.file_format' in this patch. http://gerrit.cloudera.org:8080/#/c/16446/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16446/3//COMMIT_MSG@29 PS3, Line 29: . > nit: Shouldn't we use 'iceberg.table_name' just like 'kudu.table_name'. In Done http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java: http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@69 PS3, Line 69: iceberg_file_format > If that's not too much pain, maybe we should also change it to "iceberg.fil Shall we change this property name in this patch? Maybe in another patch is better? http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@72 PS3, Line 72: iceberg.catalog > In the commit message you mention "iceberg.catalog", not "iceberg_catalog". Done http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@78 PS3, Line 78: // Each table id increase from zero : HadoopCatalog catalo > You already set it at L57. Done http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/util/IcebergUtil.java File fe/src/main/java/org/apache/impala/util/IcebergUtil.java: http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@102 PS3, Line 102: { > nit: probably 'Catalog.DEFAULT_DB' should be used Done http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@149 PS3, Line 149: .contains( > nit: Catalog.DEFAULT_DB Done http://gerrit.cloudera.org:8080/#/c/16446/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test: http://gerrit.cloudera.org:8080/#/c/16446/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@221 PS3, Line 221: SELECT count(*) from hadoop_catalog_test_external > Please add "SET TIMEZONE=UTC;" to tests that involve timestamps. For furthe Done -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 4 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Mon, 21 Sep 2020 12:56:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. IMPALA-10164: Supporting HadoopCatalog for Iceberg table This patch mainly realizes creating Iceberg table by HadoopCatalog. We only supported HadoopTables api before this patch, but now we can use HadoopCatalog to create Iceberg table. When creating managed table, we can use SQL like this: CREATE TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG LOCATION 'hdfs://test-warehouse/hadoop_catalog_test' TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog'); We supported two values ('hadoop.catalog', 'hadoop.tables') for 'iceberg.catalog' now. If you don't specify this property in your SQL, default catalog type is 'hadoop.catalog'. As for external Iceberg table, you can use SQL like this: CREATE EXTERNAL TABLE default.iceberg_test_external STORED AS ICEBERG LOCATION 'hdfs://test-warehouse/hadoop_catalog_test' TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg.table_name'='default.iceberg_test'); 'iceberg.table_name' is the managed Iceberg table name, just like 'kudu.table_name' when creating external Kudu table. If this property not been specified in SQL, Impala will use database and table name to load Iceberg table, which is 'default.iceberg_test_external' in above SQL. This property cannot be set with managed table. Testing: - Create table tests in functional_schema_template.sql - Iceberg table create test in test_iceberg.py - Iceberg table query test in test_scanners.py Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-f698d7a4-245f-44d5-8a59-ed511854c8f8-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-7c1d5490-91f7-47bd-a3b6-e86caa7fe47d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-d2ef5fcf-4346-421f-b2ef-1f9d55fb4c84-0.parquet A
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 3: (6 comments) Done http://gerrit.cloudera.org:8080/#/c/16446/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16446/2//COMMIT_MSG@21 PS2, Line 21: We supported two values ('hadoop.catalog', 'hadoop.tables') for : 'iceberg.catalog' > I'd list the currently possible values ('hadoop.catalog', 'hadoop.table'). Done http://gerrit.cloudera.org:8080/#/c/16446/2//COMMIT_MSG@22 PS2, Line 22: now. If you don't specify this property i > For me talking about external tables would be more readable in a separate p Done http://gerrit.cloudera.org:8080/#/c/16446/2//COMMIT_MSG@28 PS2, Line 28: > nit: please avoid using tabs for indentation Done http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test: http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@56 PS2, Line 56: TBLPROPERTIES('iceberg_table_name'='fake_db.fake_table'); > Is it possible to create an external table using some invalid 'iceberg_tabl Done http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test: http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@191 PS2, Line 191: SELECT count(*) from hadoop_catalog_test_external > I know that we can't write Iceberg tables at the moment but is there any wa This test files are generated by spark-shell, if you want to test non-external tables with 'hadoop_catalog', we need to create Iceberg table by Impala first, and then generate data by spark-shell. This maybe difficult to implement by impala test. Now we can only query empty Iceberg table created by Impala. http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test File testdata/workloads/functional-query/queries/QueryTest/show-create-table.test: http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test@585 PS2, Line 585: TBLPROPERTIES('iceberg_file_format'='parquet', 'iceberg_catalog'='hadoop.catalog') > Could you also add test coverage here for the default value of 'iceberg_cat Done -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 3 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Mon, 21 Sep 2020 12:32:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 3: (7 comments) Thanks for your work again! This also makes it easier for us to add HiveCatalog later on. I only found some minor issues. http://gerrit.cloudera.org:8080/#/c/16446/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16446/3//COMMIT_MSG@29 PS3, Line 29: _ nit: Shouldn't we use 'iceberg.table_name' just like 'kudu.table_name'. In the above example we use a dot for specifying the catalog, i.e. 'iceberg.catalog'. This is also what Hiveberg does, so I prefer the dot notation to be consistent. http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java: http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@69 PS3, Line 69: iceberg_file_format If that's not too much pain, maybe we should also change it to "iceberg.file_format" for consistency. http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@72 PS3, Line 72: iceberg_catalog In the commit message you mention "iceberg.catalog", not "iceberg_catalog". Hiveberg also uses the dot syntax: https://github.com/ExpediaGroup/hiveberg For Kudu options we also use the dot syntax, so maybe we should also use the dot syntax for Iceberg properties. http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@78 PS3, Line 78: // Each table id increase from zero : iThreadLocal.set(0); You already set it at L57. http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/util/IcebergUtil.java File fe/src/main/java/org/apache/impala/util/IcebergUtil.java: http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@102 PS3, Line 102: "default" nit: probably 'Catalog.DEFAULT_DB' should be used http://gerrit.cloudera.org:8080/#/c/16446/3/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@149 PS3, Line 149: "default." nit: Catalog.DEFAULT_DB http://gerrit.cloudera.org:8080/#/c/16446/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test: http://gerrit.cloudera.org:8080/#/c/16446/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@221 PS3, Line 221: where event_time > to_timestamp('2020-01-01 09:00:00','-MM-dd HH:mm:ss') Please add "SET TIMEZONE=UTC;" to tests that involve timestamps. For further info please see IMPALA-10158. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 3 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Mon, 21 Sep 2020 11:58:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7213/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 3 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Mon, 21 Sep 2020 10:05:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. IMPALA-10164: Supporting HadoopCatalog for Iceberg table This patch mainly realizes creating Iceberg table by HadoopCatalog. We only supported HadoopTables api before this patch, but now we can use HadoopCatalog to create Iceberg table. When creating managed table, we can use SQL like this: CREATE TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG LOCATION 'hdfs://test-warehouse/hadoop_catalog_test' TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog'); We supported two values ('hadoop.catalog', 'hadoop.tables') for 'iceberg.catalog' now. If you don't specify this property in your SQL, default catalog type is 'hadoop.catalog'. As for external Iceberg table, you can use SQL like this: CREATE EXTERNAL TABLE default.iceberg_test_external STORED AS ICEBERG LOCATION 'hdfs://test-warehouse/hadoop_catalog_test' TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg_table_name'='default.iceberg_test'); 'iceberg_table_name' is the managed Iceberg table name, just like 'kudu.table_name' when creating external Kudu table. If this property not been specified in SQL, Impala will use database and table name to load Iceberg table, which is 'default.iceberg_test_external' in above SQL. This property cannot be set with managed table. Testing: - Create table tests in functional_schema_template.sql - Iceberg table create test in test_iceberg.py - Iceberg table query test in test_scanners.py Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-f698d7a4-245f-44d5-8a59-ed511854c8f8-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-7c1d5490-91f7-47bd-a3b6-e86caa7fe47d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-d2ef5fcf-4346-421f-b2ef-1f9d55fb4c84-0.parquet A
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 2: (6 comments) Thanks for the changes! I have some minor ones but the patch in general is in a good shape. I'll let Zoltan to take a look also. http://gerrit.cloudera.org:8080/#/c/16446/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16446/2//COMMIT_MSG@21 PS2, Line 21: If you don't specify this property in your SQL, default catalog type is : 'hadoop.catalog'. I'd list the currently possible values ('hadoop.catalog', 'hadoop.table'). http://gerrit.cloudera.org:8080/#/c/16446/2//COMMIT_MSG@22 PS2, Line 22: And if you want to create external table, For me talking about external tables would be more readable in a separate paragraph. http://gerrit.cloudera.org:8080/#/c/16446/2//COMMIT_MSG@28 PS2, Line 28: nit: please avoid using tabs for indentation http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test: http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@56 PS2, Line 56: TBLPROPERTIES('iceberg_table_name'='fake_db.fake_table'); Is it possible to create an external table using some invalid 'iceberg_table_name'? Is there such a scenario and what is expected if there is? http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test: http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@191 PS2, Line 191: SELECT count(*) from hadoop_catalog_test_external I know that we can't write Iceberg tables at the moment but is there any way to extend these tests for non-external tables that use 'hadoop_catalog'? http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test File testdata/workloads/functional-query/queries/QueryTest/show-create-table.test: http://gerrit.cloudera.org:8080/#/c/16446/2/testdata/workloads/functional-query/queries/QueryTest/show-create-table.test@585 PS2, Line 585: TBLPROPERTIES('iceberg_file_format'='parquet', 'iceberg_catalog'='hadoop.catalog') Could you also add test coverage here for the default value of 'iceberg_catalog'? -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 2 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Thu, 17 Sep 2020 16:32:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 2: (8 comments) Hi Gabor, thanks for your review, I've already modify code. http://gerrit.cloudera.org:8080/#/c/16446/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16446/1//COMMIT_MSG@20 PS1, Line 20: og'); : If you don't spe > Is there any consideration behind making hadoop.tables the default? AFAIK h Since we only supported HadoopTables before, I use this as default catalog type, there is no more reasons. You are right, hadoop.catalog gives more than hadoop.tables, so if necessary, I will use hadoop.catalog as default catalog type and modified related files. http://gerrit.cloudera.org:8080/#/c/16446/1/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: http://gerrit.cloudera.org:8080/#/c/16446/1/common/thrift/CatalogObjects.thrift@96 PS1, Line 96: identified > nit: typo Done http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java: http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@244 PS1, Line 244: at getIcebergFile > I find it confusing that IcebergUtil also has a function with the same name Done http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java: http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@71 PS1, Line 71: // Iceberg catalog type key in tblproperties > nit: pls add comment similarly to ICEBERG_FILE_FORMAT above. Done http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java: http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@111 PS1, Line 111: return tableParams_.icebergCatalog_; : } : : @ > FeIcebergTable has 2 implementations, this class and IcebergTable. Both hav I add icebergCatalog_ in TableParams in LocalIcebergTable.java, and return 'tableParams_.icebergCatalog_' in getIcebergCatalog(), just like getIcebergTableLocation(). I found that this way is also suitable for getIcebergFileFormat(), I'm not sure to modify in this patch. http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java: http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@a93 PS1, Line 93: > As I see this patch also contains some modifications related to Iceberg tab Done http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@54 PS1, Line 54: TIcebergCatalo > Using TIcebergCatalog as a param would be easier to understand what is the Done http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@63 PS1, Line 63: Preconditions.checkArgument(catalog == TIcebergCatalog.HADOOP_TABLES); > Would worth a Preconditions check in the else branch that icebergCatalog == Done -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 2 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 15 Sep 2020 13:39:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7180/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 2 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 15 Sep 2020 13:13:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. IMPALA-10164: Supporting HadoopCatalog for Iceberg table This patch mainly realizes creating Iceberg table by HadoopCatalog. We only supported HadoopTables api before this patch, but now we can use HadoopCatalog to create Iceberg table. When creating managed table, we can use SQL like this: CREATE TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG LOCATION 'hdfs://test-warehouse/hadoop_catalog_test' TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog'); If you don't specify this property in your SQL, default catalog type is 'hadoop.catalog'. And if you want to create external table, you can use SQL like this: CREATE EXTERNAL TABLE default.iceberg_test_external STORED AS ICEBERG LOCATION 'hdfs://test-warehouse/hadoop_catalog_test' TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog', 'iceberg_table_name'='default.iceberg_test'); 'iceberg_table_name' is the managed Iceberg table name, just like 'kudu.table_name' when creating external Kudu table. If this property not been specified in SQL, Impala will use database and table name to load Iceberg table, which is 'default.iceberg_test_external' in above SQL. This property cannot be set with managed table. Testing: - Create table tests in functional_schema_template.sql - Iceberg table create test in test_iceberg.py - Iceberg table query test in test_scanners.py Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/1-1-bc402da0-b562-4310-9001-06f9b6b0f9ae-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/6-6-d253aefa-65fc-4698-8f26-b155fc965cf6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/9-9-5d04b016-05e1-43fc-b4a0-0e0df52a5035-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-20b92523-c3b9-401d-b429-363c245dbe9c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-c86370cf-10a1-4e49-86dc-b094fe739aa6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-f32f86fa-286f-4cd3-8337-98685c48176d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-b18d2bbc-46a2-4040-a4a8-7488447de3b6-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-c9bda250-ed1c-4868-bbf1-f2aad65fa80c-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/4-4-0ed77823-ded1-4a12-9e03-4027cd43966a-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-f698d7a4-245f-44d5-8a59-ed511854c8f8-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-7c1d5490-91f7-47bd-a3b6-e86caa7fe47d-0.parquet A testdata/data/iceberg_test/hadoop_catalog/hadoop_catalog_test/functional_parquet/hadoop_catalog_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-d2ef5fcf-4346-421f-b2ef-1f9d55fb4c84-0.parquet A
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/16446/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16446/1//COMMIT_MSG@20 PS1, Line 20: default catalog type is : 'hadoop.tables'. Is there any consideration behind making hadoop.tables the default? AFAIK hadoop.catalog gives more than hadoop.tables so let's consider making it as a default. But making one step forward, in the future we might want to introduce hive.catalog as well that works with S3 too so in the long run that should be the default in my opinion. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 1 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 14 Sep 2020 16:32:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 1: (8 comments) Thanks for taking care of this patch! I have some comments but nothing serious as the patch is in good shape in my opinion. What I miss is some test coverage to cover at least the new syntax. http://gerrit.cloudera.org:8080/#/c/16446/1/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: http://gerrit.cloudera.org:8080/#/c/16446/1/common/thrift/CatalogObjects.thrift@96 PS1, Line 96: identitied nit: typo http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java: http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@244 PS1, Line 244: getIcebergCatalog I find it confusing that IcebergUtil also has a function with the same name but different parameterisation. Could you consider moving this one to that class next to that function with the same name? http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java: http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@71 PS1, Line 71: public static final String ICEBERG_CATALOG = "iceberg.catalog"; nit: pls add comment similarly to ICEBERG_FILE_FORMAT above. http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java: http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@111 PS1, Line 111: @Override : public TIcebergCatalog getIcebergCatalog() { : return icebergCatalog_; : } FeIcebergTable has 2 implementations, this class and IcebergTable. Both have the same implementation of getIcebergCatalog(). Please consider move the implementation of this function into FeIcebergTable. http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@183 PS1, Line 183: getIcebergCatalog If getIcebergCatalog(msTable) was moved to IcebergUtil then we could use it here. http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java: http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@a93 PS1, Line 93: As I see this patch also contains some modifications related to Iceberg table location. Could you a) move it to a separate patch b) mention these changes in the commit message? http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@54 PS1, Line 54: String catalog Using TIcebergCatalog as a param would be easier to understand what is the intention of this param. http://gerrit.cloudera.org:8080/#/c/16446/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@63 PS1, Line 63: } else { Would worth a Preconditions check in the else branch that icebergCatalog == HADOOP_TABLES -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 1 Gerrit-Owner: wangsheng Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 14 Sep 2020 16:16:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 ) Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7173/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 1 Gerrit-Owner: wangsheng Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 14 Sep 2020 13:47:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10164: Supporting HadoopCatalog for Iceberg table
wangsheng has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16446 Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table .. IMPALA-10164: Supporting HadoopCatalog for Iceberg table This patch mainly realizes creating Iceberg table by HadoopCatalog. We only supported HadoopTables api before this patch, but now we can use HadoopCatalog to create Iceberg table like this: CREATE EXTERNAL TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG LOCATION 'hdfs://xxx' TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog'); If you don't specify this property in your SQL, default catalog type is 'hadoop.tables'. Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java 9 files changed, 158 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/16446/1 -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 1 Gerrit-Owner: wangsheng