Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/23455 )
Change subject: IMPALA-14108: Add support for SHOW FILES IN table PARTITION for Iceberg tables ...................................................................... Patch Set 3: (7 comments) Left a few minor comments, but it looks good overall! http://gerrit.cloudera.org:8080/#/c/23455/3/common/thrift/Frontend.thrift File common/thrift/Frontend.thrift: http://gerrit.cloudera.org:8080/#/c/23455/3/common/thrift/Frontend.thrift@304 PS3, Line 304: iceberg_files This field can be useful for other table formats in the future, e.g. Hudi, Paimon. I think a more generic name, e.g. "selected_files" would be better. http://gerrit.cloudera.org:8080/#/c/23455/3/fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java File fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java: http://gerrit.cloudera.org:8080/#/c/23455/3/fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java@168 PS3, Line 168: if (!deleteFiles.isEmpty()) { This if statement unnecessarily encloses the inner for-loop. If deleteFiles is empty, then the for-loop would do nothing anyway. http://gerrit.cloudera.org:8080/#/c/23455/3/fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java@169 PS3, Line 169: deleteFiles nit: we could just have 'fileScanTask.deletes()' here, no need to introduce 'deleteFiles'. http://gerrit.cloudera.org:8080/#/c/23455/3/fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java@179 PS3, Line 179: uniquePaths.add(dataFilePath) I don't think it hurts too much, but this will be always true. Each fileScanTask object refers to a different data file. http://gerrit.cloudera.org:8080/#/c/23455/3/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java: http://gerrit.cloudera.org:8080/#/c/23455/3/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@493 PS3, Line 493: rowBuilder.add("Unknown"); // Size - could be enhanced later > It could be populated by passing a compound object instead of a string to t We can retrieve the file descriptors here, IcebergUtil has public static String getFilePathHash(String path) Then IcebergContentFileStore has getDataFileDescriptor(String pathHash) and getDeleteFileDescriptor(String pathHash). http://gerrit.cloudera.org:8080/#/c/23455/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-show-files-partition.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-show-files-partition.test: http://gerrit.cloudera.org:8080/#/c/23455/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-show-files-partition.test@33 PS3, Line 33: hour(hour_timestamp)) nit: fits earlier line, but for readability it would be even better to put different transforms to different lines: (year(year_date), year(year_timestamp), month(month_date), month(month_timestamp), day(day_date), day(day_timestamp), hour(hour_timestamp)) http://gerrit.cloudera.org:8080/#/c/23455/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-show-files-partition.test@106 PS3, Line 106: .* Can we have ".*identity_boolean=true.*" here instead of just ".*"? It's not critical to update it at all places, but at least where you have multiple result lines with the same patterns. -- To view, visit http://gerrit.cloudera.org:8080/23455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb9913e078e6842861bdbb004ed5d67286bd3152 Gerrit-Change-Number: 23455 Gerrit-PatchSet: 3 Gerrit-Owner: Mihaly Szjatinya <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 25 Sep 2025 17:43:35 +0000 Gerrit-HasComments: Yes
