Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20077 )

Change subject: IMPALA-11013: Support 'MIGRATE TABLE' for external HDFS tables
......................................................................


Patch Set 5:

(19 comments)

Thanks for continuing this change, Gabor! Left a few comments, but nothing 
serious.

http://gerrit.cloudera.org:8080/#/c/20077/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20077/5//COMMIT_MSG@9
PS5, Line 9: HDFS
I think 'Hive tables' or 'legacy Hive tables' are more precise, as the tables 
can be stored anywhere, e.g. in object stores.


http://gerrit.cloudera.org:8080/#/c/20077/5//COMMIT_MSG@11
PS5, Line 11: an external table
I think 'non-transactional' is less confusing here.


http://gerrit.cloudera.org:8080/#/c/20077/5//COMMIT_MSG@13
PS5, Line 13: HDFS
'in a distributed filesystem or object store'


http://gerrit.cloudera.org:8080/#/c/20077/5//COMMIT_MSG@18
PS5, Line 18: as
is


http://gerrit.cloudera.org:8080/#/c/20077/5//COMMIT_MSG@26
PS5, Line 26:      - table is an external table
            :      - table is not a transactional table
>From Hive3+ external table == not a transactional table, so the first two 
>points mean the same. I propose to keep the 'not a transactional table' as 
>this might be less confusing.


http://gerrit.cloudera.org:8080/#/c/20077/5//COMMIT_MSG@30
PS5, Line 30: This an
missing verb


http://gerrit.cloudera.org:8080/#/c/20077/5//COMMIT_MSG@32
PS5, Line 32: new Iceberg table will have the 'external.table.purge' property 
set to
            : true after the migration.
Why don't we keep the original value? (though currently Iceberg tables with 
external.table.purge are buggy, but that's a different story)


http://gerrit.cloudera.org:8080/#/c/20077/5//COMMIT_MSG@42
PS5, Line 42: <random_ID>
Can we use the query id instead?


http://gerrit.cloudera.org:8080/#/c/20077/5//COMMIT_MSG@46
PS5, Line 46:  - Step 5: For an Iceberg table in Hadoop Catalog, run a CREATE 
TABLE
            :            query to add the Iceberg table to HMS as well.
            :  - Step 6: For an Iceberg table in Hadoop Catalog, set the
            :            'external.table.purge' property to true in an ALTER 
TABLE
            :            query.
So are these not relevant for tables in the HiveCatalog? Then probably it would 
be better to mark them as optional steps.


http://gerrit.cloudera.org:8080/#/c/20077/5//COMMIT_MSG@56
PS5, Line 56:  - Manually tested the runtime performance for a table that is
            :    unpartitioned and has 10k data files. The runtime is around 
10-13s.
Could you please also try to convert tpcds.store_sales at a scale of 10?


http://gerrit.cloudera.org:8080/#/c/20077/5/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/20077/5/be/src/service/client-request-state.cc@2240
PS5, Line 2240: some table properties
Could you add a pointer to the Frontend code that sets these table properties?


http://gerrit.cloudera.org:8080/#/c/20077/5/be/src/service/client-request-state.cc@2240
PS5, Line 2240: the
nit: redundant 'the'


http://gerrit.cloudera.org:8080/#/c/20077/5/be/src/service/client-request-state.cc@2281
PS5, Line 2281: child_queries
This hides the outer child_queries which can be confusing later, so maybe 
rename this to 'iceberg_child_queries'?


http://gerrit.cloudera.org:8080/#/c/20077/5/be/src/service/client-request-state.cc@2283
PS5, Line 2283: params.__isset.create_iceberg_table_query
Isn't it always set?


http://gerrit.cloudera.org:8080/#/c/20077/5/fe/src/main/java/org/apache/impala/analysis/ConvertTableToIcebergStmt.java
File fe/src/main/java/org/apache/impala/analysis/ConvertTableToIcebergStmt.java:

http://gerrit.cloudera.org:8080/#/c/20077/5/fe/src/main/java/org/apache/impala/analysis/ConvertTableToIcebergStmt.java@90
PS5, Line 90:     if 
(!MetaStoreUtils.isExternalTable(table.getMetaStoreTable())) {
            :       throw new AnalysisException(
            :           "CONVERT TO ICEBERG is not supported for managed 
tables");
            :     }
            :
            :     if (table.getMetaStoreTable().getParameters() != null &&
            :         
AcidUtils.isTransactionalTable(table.getMetaStoreTable().getParameters())) {
            :       throw new AnalysisException(
            :           "CONVERT TO ICEBERG is not supported for transactional 
tables");
            :     }
These will be true/false at the same time. Maybe just keep the one about 
transactional tables?


http://gerrit.cloudera.org:8080/#/c/20077/5/fe/src/main/java/org/apache/impala/analysis/ConvertTableToIcebergStmt.java@115
PS5, Line 115:     if (TIcebergCatalog.HADOOP_CATALOG == 
IcebergUtil.getTIcebergCatalog(properties_)) {
             :       throw new AnalysisException("The Hadoop Catalog is not 
supported because the " +
             :           "location may change");
             :     }
The commit message mentions Hadoop Catalog.


http://gerrit.cloudera.org:8080/#/c/20077/5/fe/src/main/java/org/apache/impala/analysis/ConvertTableToIcebergStmt.java@120
PS5, Line 120: Iceberg issue #7612
Could you please add the link https://github.com/apache/iceberg/issues/7612 ?


http://gerrit.cloudera.org:8080/#/c/20077/5/fe/src/main/java/org/apache/impala/analysis/ConvertTableToIcebergStmt.java@126
PS5, Line 126:         if (partitionValue.contains("/")) {
             :           throw new AnalysisException ("Can't migrate table with 
'/' in the partition " +
             :               "values until Iceberg #7612 is fixed. '" + 
partitionValue + "'");
             :         }
Shouldn't we check for any UTF-encoded char?


http://gerrit.cloudera.org:8080/#/c/20077/5/fe/src/main/java/org/apache/impala/util/MigrateTableUtil.java
File fe/src/main/java/org/apache/impala/util/MigrateTableUtil.java:

http://gerrit.cloudera.org:8080/#/c/20077/5/fe/src/main/java/org/apache/impala/util/MigrateTableUtil.java@108
PS5, Line 108:       if (IcebergUtil.isHiveCatalog(props)) {
             :         catalog.dropTable(tableName.getDb(), tableName.getTbl(), 
false);
             :       }
Please add comment why we are dropping the table.



--
To view, visit http://gerrit.cloudera.org:8080/20077
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iacdad996d680fe545cc9a45e6bc64a348a64cd80
Gerrit-Change-Number: 20077
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tamas Mate <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 22 Jun 2023 19:14:26 +0000
Gerrit-HasComments: Yes

Reply via email to