Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21959 )

Change subject: IMPALA-13370: Read Puffin stats from metadata.json property if 
available
......................................................................


Patch Set 7:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/21959/6/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java
File fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java:

http://gerrit.cloudera.org:8080/#/c/21959/6/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@93
PS6, Line 93:
            :   private void loadStatsFromMetadata(StatisticsFile statsFile) {
            :     final long
> No need to return this list anymore.
Done


http://gerrit.cloudera.org:8080/#/c/21959/6/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@127
PS6, Line 127: rack of the Ic
> We don't have this parameter anymore
Done


http://gerrit.cloudera.org:8080/#/c/21959/6/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@133
PS6, Line 133:     try {
> No need to check NULL-ness.
Done


http://gerrit.cloudera.org:8080/#/c/21959/6/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@143
PS6, Line 143:     : puffinReader.readAll(blobs)) {
> Is it possible that the BlobMetadata in the Puffin file contains the NDV wh
I don't think the spec explicitly prohibits it, but I don't think it's a use 
case for which we should add code complexity.


http://gerrit.cloudera.org:8080/#/c/21959/6/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@182
PS6, Line 182:
             :     return res;
             :   }
             :
             :   private long 
getNdvFromMetadata(org.apache.iceberg.BlobMetadata blob) {
             :     Strin
> Do we need this duplicate check? Aren't these already reported during the f
In the first round we only registered NDVs in 'result_' where it was available 
in the metadata.json. If it was missing or invalid we didn't register it.
It is possible that there are two Puffin files with NDVs for the same column, 
but neither of them (or only one of them) has a value in the metadata.json. 
This situation wouldn't be discovered in the first loop.


http://gerrit.cloudera.org:8080/#/c/21959/6/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@201
PS6, Line 201:     }
> Maybe we could log the invalid "ndv" string.
Done



--
To view, visit http://gerrit.cloudera.org:8080/21959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5e92056ce97c4849742db6309562af3b575f647b
Gerrit-Change-Number: 21959
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Fri, 22 Nov 2024 10:37:48 +0000
Gerrit-HasComments: Yes

Reply via email to