Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/22367 )
Change subject: IMPALA-13654: Tolerate missing data files of Iceberg tables ...................................................................... Patch Set 4: (4 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/22367/4/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java File fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java: http://gerrit.cloudera.org:8080/#/c/22367/4/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@232 PS4, Line 232: missingFiles_.add(contentFile.path().toString()); > Can you make missingFiles_ a map that stores the original exceptions so the It's possible, but I'm afraid about the case when there are lots os missing files due to mass deletes. In such cases we would redundantly consume too much memory, might require too large buffers for network traffic between CatalogD <-> Coordinator, and it could be very noisy for the end user. How about just showing the list of missing data files (or even just the first N data files) and redirect the user to check the CatalogD logs where they should be able to find the original exception for every file? http://gerrit.cloudera.org:8080/#/c/22367/4/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java File fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/22367/4/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java@183 PS4, Line 183: LOG.warn(String.format("Missing Iceberg content file: %s", > Consider storing the original exceptions here also. Good catch, done. http://gerrit.cloudera.org:8080/#/c/22367/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/22367/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@180 PS4, Line 180: throw new ImpalaRuntimeException(String.format( > Throw original exception here. Added reference to CatalogD logs. http://gerrit.cloudera.org:8080/#/c/22367/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@728 PS4, Line 728: throw new ImpalaRuntimeException("Cannot find file: " + cf.path() + > Throw original exception here. Added reference to CatalogD logs. -- To view, visit http://gerrit.cloudera.org:8080/22367 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If753619d8ee1b30f018e90157ff7bdbe5d7f1525 Gerrit-Change-Number: 22367 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 06 Feb 2025 14:32:59 +0000 Gerrit-HasComments: Yes
