Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/19776 )
Change subject: IMPALA-11877: (part 1) Add support for DELETE statements for UNPARTITIONED Iceberg tables ...................................................................... Patch Set 10: (10 comments) Excellent work, Zoli! Sorry, for the long delay with the review. http://gerrit.cloudera.org:8080/#/c/19776/10/fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java File fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java: http://gerrit.cloudera.org:8080/#/c/19776/10/fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java@467 PS10, Line 467: List<String> path = Path.createRawPath(targetTableRef_.getUniqueAlias(), k); : SlotRef ref = new SlotRef(path); : ref.analyze(analyzer); : selectList.add(new SelectListItem(ref, null)); : uniqueSlots.add(ref.getSlotId()); : keySlots.add(ref.getSlotId()); : referencedColumns.add(colIndexMap.get(k)); For me it seems that the body of the for loop is identical between the Iceberg and Kudu implementations just the iterated items are different. Could you move this to a common place between the 2 implementations? http://gerrit.cloudera.org:8080/#/c/19776/10/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java: http://gerrit.cloudera.org:8080/#/c/19776/10/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@340 PS10, Line 340: get You can use the getOrDefault(..) method on properties() to simplify this function. String writeFormat = ... properties().getOrDefault(DEFAULT_FILE_FORMAT, DEFAULT_FILE_FORMAT_DEFAULT); return getIceergFormat(writeFormat); http://gerrit.cloudera.org:8080/#/c/19776/10/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@349 PS10, Line 349: getIcebergApiTable().properties().get(TableProperties.DELETE_DEFAULT_FILE_FORMAT); You can use getOrDefault() here as well, where the default value is getWriteFileFormat(). http://gerrit.cloudera.org:8080/#/c/19776/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteSink.java File fe/src/main/java/org/apache/impala/planner/IcebergDeleteSink.java: http://gerrit.cloudera.org:8080/#/c/19776/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteSink.java@28 PS10, Line 28: import org.slf4j.Logger; Logger and LoggerFactory aren't used. http://gerrit.cloudera.org:8080/#/c/19776/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteSink.java@61 PS10, Line 61: long numBufferedPartitionsPerInstance = 1; These variables can be final, if I'm not mistaken. http://gerrit.cloudera.org:8080/#/c/19776/10/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/19776/10/fe/src/main/java/org/apache/impala/service/Frontend.java@2548 PS10, Line 2548: private static void addFinalizationParamsForDelete( nit: I think the indentation is off in this function. http://gerrit.cloudera.org:8080/#/c/19776/10/fe/src/main/java/org/apache/impala/service/Frontend.java@2555 PS10, Line 2555: TFinalizeParams finalizeParams = new TFinalizeParams(); : finalizeParams.setTable_name(targetTable.getTableName().getTbl()); : finalizeParams.setTable_id(DescriptorTable.TABLE_SINK_ID); : String db = targetTable.getTableName().getDb(); : finalizeParams.setTable_db(db == null ? queryCtx.session.database : db); : FeFsTable hdfsTable = (FeFsTable) targetTable; : finalizeParams.setHdfs_base_dir(hdfsTable.getHdfsBaseDir()); This part is kind of the same as in addFinalizatonParamsForInsert(). Would it make sense to introduce a common function to populate the non-iceberg part? http://gerrit.cloudera.org:8080/#/c/19776/10/testdata/workloads/functional-query/queries/QueryTest/iceberg-delete-complex.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-delete-complex.test: http://gerrit.cloudera.org:8080/#/c/19776/10/testdata/workloads/functional-query/queries/QueryTest/iceberg-delete-complex.test@3 PS10, Line 3: # Delete from empty table is no-op. I think this comment is not relevant here. http://gerrit.cloudera.org:8080/#/c/19776/10/testdata/workloads/functional-query/queries/QueryTest/iceberg-delete-complex.test@90 PS10, Line 90: ==== Could you give it a try to include an UNNEST() fn call into the select list when querying the INPUT__FILE__NAME and also having delete files? http://gerrit.cloudera.org:8080/#/c/19776/10/tests/query_test/test_iceberg.py File tests/query_test/test_iceberg.py: http://gerrit.cloudera.org:8080/#/c/19776/10/tests/query_test/test_iceberg.py@1163 PS10, Line 1163: self.run_stmt_in_hive("SELECT * FROM {} ORDER BY i".format(ice_delete)) You run this 2 SELECTs twice. I think this line can be dropped. -- To view, visit http://gerrit.cloudera.org:8080/19776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic933b2295abe54b46d2a736961219988ff42915b Gerrit-Change-Number: 19776 Gerrit-PatchSet: 10 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Andrew Sherman <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tamas Mate <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 06 Jun 2023 07:43:52 +0000 Gerrit-HasComments: Yes
