Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/23906 )
Change subject: IMPALA-13122: Add detailed file metadata statistics to table loading logs ...................................................................... Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/23906/2/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java File fe/src/main/java/org/apache/impala/catalog/FeFsTable.java: http://gerrit.cloudera.org:8080/#/c/23906/2/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@124 PS2, Line 124: // Set of unique host indices (for HDFS/Ozone only) : public Set<Integer> uniqueHostIndices = new HashSet<>(); : // Set of unique disk IDs (for HDFS/Ozone only) : public Set<Short> uniqueDiskIds = new HashSet<>(); I am not sure about the usefulness of treating these as two separate sets - generally we are interested in the number of host and the number of host:disk pairs http://gerrit.cloudera.org:8080/#/c/23906/2/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@188 PS2, Line 188: // Note: We cannot accurately update min/max values when removing stats : // Remove hosts and disks from the sets if they belong to the removed files not sure if it worth it, but remove() could be implemented with counter maps - when the number of blocks that reference it reaches 0, then it can be removed http://gerrit.cloudera.org:8080/#/c/23906/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/23906/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@802 PS2, Line 802: if (aggregatedStats.numFiles > 0) { : statsLog.append("Files: ").append(aggregatedStats.numFiles) : .append(", Blocks: ").append(aggregatedStats.numBlocks) : .append(", Total size: ") : .append(PrintUtils.printBytes(aggregatedStats.totalFileBytes)) : .append(", File sizes (min/avg/max): ") : .append(PrintUtils.printBytes(aggregatedStats.minFileBytes)).append("/") : .append(PrintUtils.printBytes(aggregatedStats.getAvgFileBytes())).append("/") : .append(PrintUtils.printBytes(aggregatedStats.maxFileBytes)); : : // Modification time statistics (formatted as dates) : if (aggregatedStats.minModificationTime != Long.MAX_VALUE && : aggregatedStats.maxModificationTime > 0) { : statsLog.append(", Modification times (min/max): ") : .append(dateFormatter.format(java.time.Instant.ofEpochMilli( : aggregatedStats.minModificationTime))) : .append("/") : .append(dateFormatter.format(java.time.Instant.ofEpochMilli( : aggregatedStats.maxModificationTime))); : } : : // Access time statistics (formatted as dates) : // Note: Access time may be 0 or not updated if disabled in HDFS for performance : if (aggregatedStats.minAccessTime != Long.MAX_VALUE && : aggregatedStats.maxAccessTime > 0) { : statsLog.append(", Access times (min/max): ") : .append(dateFormatter.format(java.time.Instant.ofEpochMilli( : aggregatedStats.minAccessTime))) : .append("/") : .append(dateFormatter.format(java.time.Instant.ofEpochMilli( : aggregatedStats.maxAccessTime))); : } : : // HDFS/Ozone specific stats : if (aggregatedStats.getNumUniqueHosts() > 0) { : statsLog.append(", Hosts: ").append(aggregatedStats.getNumUniqueHosts()); : } : if (aggregatedStats.getNumUniqueDisks() > 0) { : statsLog.append(", Disks: ").append(aggregatedStats.getNumUniqueDisks()); : } : } This looks like something that could be moved to a function within FileMetadataStats -- To view, visit http://gerrit.cloudera.org:8080/23906 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6f4592f173c047e5064058402f83be6d1f5c9a79 Gerrit-Change-Number: 23906 Gerrit-PatchSet: 2 Gerrit-Owner: Arnab Karmakar <[email protected]> Gerrit-Reviewer: Arnab Karmakar <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jason Fehr <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Pranav Lodha <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 27 Jan 2026 15:12:37 +0000 Gerrit-HasComments: Yes
