Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23455 )

Change subject: IMPALA-14108: Add support for SHOW FILES IN table PARTITION for 
Iceberg tables
......................................................................


Patch Set 3:

(7 comments)

Left a few minor comments, but it looks good overall!

http://gerrit.cloudera.org:8080/#/c/23455/3/common/thrift/Frontend.thrift
File common/thrift/Frontend.thrift:

http://gerrit.cloudera.org:8080/#/c/23455/3/common/thrift/Frontend.thrift@304
PS3, Line 304: iceberg_files
This field can be useful for other table formats in the future, e.g. Hudi, 
Paimon. I think a more generic name, e.g. "selected_files" would be better.


http://gerrit.cloudera.org:8080/#/c/23455/3/fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java
File fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java:

http://gerrit.cloudera.org:8080/#/c/23455/3/fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java@168
PS3, Line 168: if (!deleteFiles.isEmpty()) {
This if statement unnecessarily encloses the inner for-loop. If deleteFiles is 
empty, then the for-loop would do nothing anyway.


http://gerrit.cloudera.org:8080/#/c/23455/3/fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java@169
PS3, Line 169: deleteFiles
nit: we could just have 'fileScanTask.deletes()' here, no need to introduce 
'deleteFiles'.


http://gerrit.cloudera.org:8080/#/c/23455/3/fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java@179
PS3, Line 179: uniquePaths.add(dataFilePath)
I don't think it hurts too much, but this will be always true. Each 
fileScanTask object refers to a different data file.


http://gerrit.cloudera.org:8080/#/c/23455/3/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/23455/3/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@493
PS3, Line 493:         rowBuilder.add("Unknown");  // Size - could be enhanced 
later
> It could be populated by passing a compound object instead of a string to t
We can retrieve the file descriptors here, IcebergUtil has

 public static String getFilePathHash(String path)

Then IcebergContentFileStore has getDataFileDescriptor(String pathHash) and 
getDeleteFileDescriptor(String pathHash).


http://gerrit.cloudera.org:8080/#/c/23455/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-show-files-partition.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-show-files-partition.test:

http://gerrit.cloudera.org:8080/#/c/23455/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-show-files-partition.test@33
PS3, Line 33: hour(hour_timestamp))
nit: fits earlier line, but for readability it would be even better to put 
different transforms to different lines:

  (year(year_date), year(year_timestamp),
  month(month_date), month(month_timestamp),
  day(day_date), day(day_timestamp),
  hour(hour_timestamp))


http://gerrit.cloudera.org:8080/#/c/23455/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-show-files-partition.test@106
PS3, Line 106: .*
Can we have ".*identity_boolean=true.*" here instead of just ".*"?

It's not critical to update it at all places, but at least where you have 
multiple result lines with the same patterns.



--
To view, visit http://gerrit.cloudera.org:8080/23455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb9913e078e6842861bdbb004ed5d67286bd3152
Gerrit-Change-Number: 23455
Gerrit-PatchSet: 3
Gerrit-Owner: Mihaly Szjatinya <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 25 Sep 2025 17:43:35 +0000
Gerrit-HasComments: Yes

Reply via email to