Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/20077 )
Change subject: IMPALA-11013: Support 'MIGRATE TABLE' for external HDFS tables ...................................................................... Patch Set 5: (19 comments) Thanks for continuing this change, Gabor! Left a few comments, but nothing serious. http://gerrit.cloudera.org:8080/#/c/20077/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20077/5//COMMIT_MSG@9 PS5, Line 9: HDFS I think 'Hive tables' or 'legacy Hive tables' are more precise, as the tables can be stored anywhere, e.g. in object stores. http://gerrit.cloudera.org:8080/#/c/20077/5//COMMIT_MSG@11 PS5, Line 11: an external table I think 'non-transactional' is less confusing here. http://gerrit.cloudera.org:8080/#/c/20077/5//COMMIT_MSG@13 PS5, Line 13: HDFS 'in a distributed filesystem or object store' http://gerrit.cloudera.org:8080/#/c/20077/5//COMMIT_MSG@18 PS5, Line 18: as is http://gerrit.cloudera.org:8080/#/c/20077/5//COMMIT_MSG@26 PS5, Line 26: - table is an external table : - table is not a transactional table >From Hive3+ external table == not a transactional table, so the first two >points mean the same. I propose to keep the 'not a transactional table' as >this might be less confusing. http://gerrit.cloudera.org:8080/#/c/20077/5//COMMIT_MSG@30 PS5, Line 30: This an missing verb http://gerrit.cloudera.org:8080/#/c/20077/5//COMMIT_MSG@32 PS5, Line 32: new Iceberg table will have the 'external.table.purge' property set to : true after the migration. Why don't we keep the original value? (though currently Iceberg tables with external.table.purge are buggy, but that's a different story) http://gerrit.cloudera.org:8080/#/c/20077/5//COMMIT_MSG@42 PS5, Line 42: <random_ID> Can we use the query id instead? http://gerrit.cloudera.org:8080/#/c/20077/5//COMMIT_MSG@46 PS5, Line 46: - Step 5: For an Iceberg table in Hadoop Catalog, run a CREATE TABLE : query to add the Iceberg table to HMS as well. : - Step 6: For an Iceberg table in Hadoop Catalog, set the : 'external.table.purge' property to true in an ALTER TABLE : query. So are these not relevant for tables in the HiveCatalog? Then probably it would be better to mark them as optional steps. http://gerrit.cloudera.org:8080/#/c/20077/5//COMMIT_MSG@56 PS5, Line 56: - Manually tested the runtime performance for a table that is : unpartitioned and has 10k data files. The runtime is around 10-13s. Could you please also try to convert tpcds.store_sales at a scale of 10? http://gerrit.cloudera.org:8080/#/c/20077/5/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/20077/5/be/src/service/client-request-state.cc@2240 PS5, Line 2240: some table properties Could you add a pointer to the Frontend code that sets these table properties? http://gerrit.cloudera.org:8080/#/c/20077/5/be/src/service/client-request-state.cc@2240 PS5, Line 2240: the nit: redundant 'the' http://gerrit.cloudera.org:8080/#/c/20077/5/be/src/service/client-request-state.cc@2281 PS5, Line 2281: child_queries This hides the outer child_queries which can be confusing later, so maybe rename this to 'iceberg_child_queries'? http://gerrit.cloudera.org:8080/#/c/20077/5/be/src/service/client-request-state.cc@2283 PS5, Line 2283: params.__isset.create_iceberg_table_query Isn't it always set? http://gerrit.cloudera.org:8080/#/c/20077/5/fe/src/main/java/org/apache/impala/analysis/ConvertTableToIcebergStmt.java File fe/src/main/java/org/apache/impala/analysis/ConvertTableToIcebergStmt.java: http://gerrit.cloudera.org:8080/#/c/20077/5/fe/src/main/java/org/apache/impala/analysis/ConvertTableToIcebergStmt.java@90 PS5, Line 90: if (!MetaStoreUtils.isExternalTable(table.getMetaStoreTable())) { : throw new AnalysisException( : "CONVERT TO ICEBERG is not supported for managed tables"); : } : : if (table.getMetaStoreTable().getParameters() != null && : AcidUtils.isTransactionalTable(table.getMetaStoreTable().getParameters())) { : throw new AnalysisException( : "CONVERT TO ICEBERG is not supported for transactional tables"); : } These will be true/false at the same time. Maybe just keep the one about transactional tables? http://gerrit.cloudera.org:8080/#/c/20077/5/fe/src/main/java/org/apache/impala/analysis/ConvertTableToIcebergStmt.java@115 PS5, Line 115: if (TIcebergCatalog.HADOOP_CATALOG == IcebergUtil.getTIcebergCatalog(properties_)) { : throw new AnalysisException("The Hadoop Catalog is not supported because the " + : "location may change"); : } The commit message mentions Hadoop Catalog. http://gerrit.cloudera.org:8080/#/c/20077/5/fe/src/main/java/org/apache/impala/analysis/ConvertTableToIcebergStmt.java@120 PS5, Line 120: Iceberg issue #7612 Could you please add the link https://github.com/apache/iceberg/issues/7612 ? http://gerrit.cloudera.org:8080/#/c/20077/5/fe/src/main/java/org/apache/impala/analysis/ConvertTableToIcebergStmt.java@126 PS5, Line 126: if (partitionValue.contains("/")) { : throw new AnalysisException ("Can't migrate table with '/' in the partition " + : "values until Iceberg #7612 is fixed. '" + partitionValue + "'"); : } Shouldn't we check for any UTF-encoded char? http://gerrit.cloudera.org:8080/#/c/20077/5/fe/src/main/java/org/apache/impala/util/MigrateTableUtil.java File fe/src/main/java/org/apache/impala/util/MigrateTableUtil.java: http://gerrit.cloudera.org:8080/#/c/20077/5/fe/src/main/java/org/apache/impala/util/MigrateTableUtil.java@108 PS5, Line 108: if (IcebergUtil.isHiveCatalog(props)) { : catalog.dropTable(tableName.getDb(), tableName.getTbl(), false); : } Please add comment why we are dropping the table. -- To view, visit http://gerrit.cloudera.org:8080/20077 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iacdad996d680fe545cc9a45e6bc64a348a64cd80 Gerrit-Change-Number: 20077 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tamas Mate <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 22 Jun 2023 19:14:26 +0000 Gerrit-HasComments: Yes
