Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19776 )

Change subject: IMPALA-11877: (part 1) Add support for DELETE statements for 
UNPARTITIONED Iceberg tables
......................................................................


Patch Set 10:

(10 comments)

Excellent work, Zoli! Sorry, for the long delay with the review.

http://gerrit.cloudera.org:8080/#/c/19776/10/fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
File fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java:

http://gerrit.cloudera.org:8080/#/c/19776/10/fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java@467
PS10, Line 467: List<String> path = 
Path.createRawPath(targetTableRef_.getUniqueAlias(), k);
              :         SlotRef ref = new SlotRef(path);
              :         ref.analyze(analyzer);
              :         selectList.add(new SelectListItem(ref, null));
              :         uniqueSlots.add(ref.getSlotId());
              :         keySlots.add(ref.getSlotId());
              :         referencedColumns.add(colIndexMap.get(k));
For me it seems that the body of the for loop is identical between the Iceberg 
and Kudu implementations just the iterated items are different. Could you move 
this to a common place between the 2 implementations?


http://gerrit.cloudera.org:8080/#/c/19776/10/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/19776/10/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@340
PS10, Line 340: get
You can use the getOrDefault(..) method on properties() to simplify this 
function.
String writeFormat = ... properties().getOrDefault(DEFAULT_FILE_FORMAT, 
DEFAULT_FILE_FORMAT_DEFAULT);
return getIceergFormat(writeFormat);


http://gerrit.cloudera.org:8080/#/c/19776/10/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@349
PS10, Line 349:         
getIcebergApiTable().properties().get(TableProperties.DELETE_DEFAULT_FILE_FORMAT);
You can use getOrDefault() here as well, where the default value is 
getWriteFileFormat().


http://gerrit.cloudera.org:8080/#/c/19776/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteSink.java
File fe/src/main/java/org/apache/impala/planner/IcebergDeleteSink.java:

http://gerrit.cloudera.org:8080/#/c/19776/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteSink.java@28
PS10, Line 28: import org.slf4j.Logger;
Logger and LoggerFactory aren't used.


http://gerrit.cloudera.org:8080/#/c/19776/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteSink.java@61
PS10, Line 61:     long numBufferedPartitionsPerInstance = 1;
These variables can be final, if I'm not mistaken.


http://gerrit.cloudera.org:8080/#/c/19776/10/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/19776/10/fe/src/main/java/org/apache/impala/service/Frontend.java@2548
PS10, Line 2548:   private static void addFinalizationParamsForDelete(
nit: I think the indentation is off in this function.


http://gerrit.cloudera.org:8080/#/c/19776/10/fe/src/main/java/org/apache/impala/service/Frontend.java@2555
PS10, Line 2555: TFinalizeParams finalizeParams = new TFinalizeParams();
               :       
finalizeParams.setTable_name(targetTable.getTableName().getTbl());
               :       
finalizeParams.setTable_id(DescriptorTable.TABLE_SINK_ID);
               :       String db = targetTable.getTableName().getDb();
               :       finalizeParams.setTable_db(db == null ? 
queryCtx.session.database : db);
               :       FeFsTable hdfsTable = (FeFsTable) targetTable;
               :       
finalizeParams.setHdfs_base_dir(hdfsTable.getHdfsBaseDir());
This part is kind of the same as in addFinalizatonParamsForInsert(). Would it 
make sense to introduce a common function to populate the non-iceberg part?


http://gerrit.cloudera.org:8080/#/c/19776/10/testdata/workloads/functional-query/queries/QueryTest/iceberg-delete-complex.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-delete-complex.test:

http://gerrit.cloudera.org:8080/#/c/19776/10/testdata/workloads/functional-query/queries/QueryTest/iceberg-delete-complex.test@3
PS10, Line 3: # Delete from empty table is no-op.
I think this comment is not relevant here.


http://gerrit.cloudera.org:8080/#/c/19776/10/testdata/workloads/functional-query/queries/QueryTest/iceberg-delete-complex.test@90
PS10, Line 90: ====
Could you give it a try to include an UNNEST() fn call into the select list 
when querying the INPUT__FILE__NAME and also having delete files?


http://gerrit.cloudera.org:8080/#/c/19776/10/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/19776/10/tests/query_test/test_iceberg.py@1163
PS10, Line 1163:     self.run_stmt_in_hive("SELECT * FROM {} ORDER BY 
i".format(ice_delete))
You run this 2 SELECTs twice. I think this line can be dropped.



--
To view, visit http://gerrit.cloudera.org:8080/19776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic933b2295abe54b46d2a736961219988ff42915b
Gerrit-Change-Number: 19776
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Andrew Sherman <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tamas Mate <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 06 Jun 2023 07:43:52 +0000
Gerrit-HasComments: Yes

Reply via email to