Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22367 )

Change subject: IMPALA-13654: Tolerate missing data files of Iceberg tables
......................................................................


Patch Set 4:

(4 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/22367/4/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
File fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java:

http://gerrit.cloudera.org:8080/#/c/22367/4/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@232
PS4, Line 232:       missingFiles_.add(contentFile.path().toString());
> Can you make missingFiles_ a map that stores the original exceptions so the
It's possible, but I'm afraid about the case when there are lots os missing 
files due to mass deletes. In such cases we would redundantly consume too much 
memory, might require too large buffers for network traffic between CatalogD 
<-> Coordinator, and it could be very noisy for the end user.

How about just showing the list of missing data files (or even just the first N 
data files) and redirect the user to check the CatalogD logs where they should 
be able to find the original exception for every file?


http://gerrit.cloudera.org:8080/#/c/22367/4/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/22367/4/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java@183
PS4, Line 183:       LOG.warn(String.format("Missing Iceberg content file: %s",
> Consider storing the original exceptions here also.
Good catch, done.


http://gerrit.cloudera.org:8080/#/c/22367/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/22367/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@180
PS4, Line 180:       throw new ImpalaRuntimeException(String.format(
> Throw original exception here.
Added reference to CatalogD logs.


http://gerrit.cloudera.org:8080/#/c/22367/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@728
PS4, Line 728:         throw new ImpalaRuntimeException("Cannot find file: " + 
cf.path() +
> Throw original exception here.
Added reference to CatalogD logs.



--
To view, visit http://gerrit.cloudera.org:8080/22367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If753619d8ee1b30f018e90157ff7bdbe5d7f1525
Gerrit-Change-Number: 22367
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 06 Feb 2025 14:32:59 +0000
Gerrit-HasComments: Yes

Reply via email to