Hello Tamas Mate, [email protected], Zoltan Borok-Nagy, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20077

to look at the new patch set (#8).

Change subject: IMPALA-11013: Support 'MIGRATE TABLE' for external Hive tables
......................................................................

IMPALA-11013: Support 'MIGRATE TABLE' for external Hive tables

This patch implements the migration from legacy Hive tables to Iceberg
tables. The target Iceberg tables inherit the location of the original
Hive tables. The Hive table has to be a non-transactional table.

To migrate a Hive format table stored in a distributed system or object
store to an Iceberg table use the command:

ALTER TABLE [dbname.]table_name CONVERT TO ICEBERG [TBLPROPERTIES(...)];

Currently only 'iceberg.catalog' is allowed as a table property.

For example
     - ALTER TABLE hive_table CONVERT TO ICEBERG;
     - ALTER TABLE hive_table CONVERT TO ICEBERG TBLPROPERTIES(
       'iceberg.catalog' = 'hadoop.tables');

The HDFS table to be converted must follow those requirements:
     - table is not a transactional table
     - InputFormat must be either PARQUET, ORC, or AVRO

This is an in-place migration so the original data files of the legacy
Hive table are re-used and not moved, copied or re-created by this
operation. The new Iceberg table will have the 'external.table.purge'
property set to true after the migration.

NUM_THREADS_FOR_TABLE_MIGRATION query option can control the maximum
number of threads to execute the table conversion. The default value is
one, meaning that table conversion runs on one thread. It can be
configured in a range of [0, 1024]. Zero means that the number of CPU
cores will be the degree of parallelism. A value greater than zero will
imply the number of threads used for table conversion, however, there
is a cap of the number of CPU cores as the highest degree of
parallelism.

Process of migration:
 - Step 1: Setting table properties,
           e.g. 'external.table.purge'=false on the HDFS table.
 - Step 2: Rename the HDFS table to a temporary table name using a name
           format of "<original_table_name>_tmp_<random_ID>".
 - Step 3: Refresh the renamed HDFS table.
 - Step 4: Create an external Iceberg table by Iceberg API using the
           data of the Hdfs table.
 - Step 5 (Optional): For an Iceberg table in Hadoop Tables, run a
           CREATE TABLE query to add the Iceberg table to HMS as well.
 - Step 6 (Optional): For an Iceberg table in Hadoop Tables, set the
           'external.table.purge' property to true in an ALTER TABLE
           query.
 - Step 7: Drop the temporary HDFS table.

Testing:
 - Add e2e tests
 - Add FE UTs
 - Manually tested the runtime performance for a table that is
   unpartitioned and has 10k data files. The runtime is around 10-13s.

Co-authored-by: lipenglin <[email protected]>

Change-Id: Iacdad996d680fe545cc9a45e6bc64a348a64cd80
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/Frontend.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M common/thrift/Types.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
A fe/src/main/java/org/apache/impala/analysis/ConvertTableToIcebergStmt.java
M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java
M fe/src/main/java/org/apache/impala/analysis/QueryStringBuilder.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalog.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopCatalog.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopTables.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A fe/src/main/java/org/apache/impala/util/MigrateTableUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-migrate-from-external-hdfs-tables.test
M tests/authorization/test_ranger.py
M tests/query_test/test_iceberg.py
34 files changed, 1,399 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/20077/8
--
To view, visit http://gerrit.cloudera.org:8080/20077
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iacdad996d680fe545cc9a45e6bc64a348a64cd80
Gerrit-Change-Number: 20077
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tamas Mate <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>

Reply via email to