Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23906 )

Change subject: IMPALA-13122: Add detailed file metadata statistics to table 
loading logs
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/23906/2/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
File fe/src/main/java/org/apache/impala/catalog/FeFsTable.java:

http://gerrit.cloudera.org:8080/#/c/23906/2/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@124
PS2, Line 124:     // Set of unique host indices (for HDFS/Ozone only)
             :     public Set<Integer> uniqueHostIndices = new HashSet<>();
             :     // Set of unique disk IDs (for HDFS/Ozone only)
             :     public Set<Short> uniqueDiskIds = new HashSet<>();
I am not sure about the usefulness of treating these as two separate sets - 
generally we are interested in the number of host and the number of host:disk 
pairs


http://gerrit.cloudera.org:8080/#/c/23906/2/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@188
PS2, Line 188:       // Note: We cannot accurately update min/max values when 
removing stats
             :       // Remove hosts and disks from the sets if they belong to 
the removed files
not sure if it worth it, but remove() could be implemented with counter maps - 
when the number of blocks that reference it reaches 0, then it can be removed


http://gerrit.cloudera.org:8080/#/c/23906/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/23906/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@802
PS2, Line 802:       if (aggregatedStats.numFiles > 0) {
             :         statsLog.append("Files: 
").append(aggregatedStats.numFiles)
             :             .append(", Blocks: 
").append(aggregatedStats.numBlocks)
             :             .append(", Total size: ")
             :             
.append(PrintUtils.printBytes(aggregatedStats.totalFileBytes))
             :             .append(", File sizes (min/avg/max): ")
             :             
.append(PrintUtils.printBytes(aggregatedStats.minFileBytes)).append("/")
             :             
.append(PrintUtils.printBytes(aggregatedStats.getAvgFileBytes())).append("/")
             :             
.append(PrintUtils.printBytes(aggregatedStats.maxFileBytes));
             :
             :         // Modification time statistics (formatted as dates)
             :         if (aggregatedStats.minModificationTime != 
Long.MAX_VALUE &&
             :             aggregatedStats.maxModificationTime > 0) {
             :           statsLog.append(", Modification times (min/max): ")
             :               
.append(dateFormatter.format(java.time.Instant.ofEpochMilli(
             :                   aggregatedStats.minModificationTime)))
             :               .append("/")
             :               
.append(dateFormatter.format(java.time.Instant.ofEpochMilli(
             :                   aggregatedStats.maxModificationTime)));
             :         }
             :
             :         // Access time statistics (formatted as dates)
             :         // Note: Access time may be 0 or not updated if disabled 
in HDFS for performance
             :         if (aggregatedStats.minAccessTime != Long.MAX_VALUE &&
             :             aggregatedStats.maxAccessTime > 0) {
             :           statsLog.append(", Access times (min/max): ")
             :               
.append(dateFormatter.format(java.time.Instant.ofEpochMilli(
             :                   aggregatedStats.minAccessTime)))
             :               .append("/")
             :               
.append(dateFormatter.format(java.time.Instant.ofEpochMilli(
             :                   aggregatedStats.maxAccessTime)));
             :         }
             :
             :         // HDFS/Ozone specific stats
             :         if (aggregatedStats.getNumUniqueHosts() > 0) {
             :           statsLog.append(", Hosts: 
").append(aggregatedStats.getNumUniqueHosts());
             :         }
             :         if (aggregatedStats.getNumUniqueDisks() > 0) {
             :           statsLog.append(", Disks: 
").append(aggregatedStats.getNumUniqueDisks());
             :         }
             :       }
This looks like something that could be moved to a function within 
FileMetadataStats



--
To view, visit http://gerrit.cloudera.org:8080/23906
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6f4592f173c047e5064058402f83be6d1f5c9a79
Gerrit-Change-Number: 23906
Gerrit-PatchSet: 2
Gerrit-Owner: Arnab Karmakar <[email protected]>
Gerrit-Reviewer: Arnab Karmakar <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Pranav Lodha <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 27 Jan 2026 15:12:37 +0000
Gerrit-HasComments: Yes

Reply via email to