Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/23946 )
Change subject: IMPALA-14623: Optimize memory usage for Iceberg file path hashes ...................................................................... Patch Set 1: (4 comments) Thanks for working on this! http://gerrit.cloudera.org:8080/#/c/23946/1/be/src/util/thash128-util.h File be/src/util/thash128-util.h: http://gerrit.cloudera.org:8080/#/c/23946/1/be/src/util/thash128-util.h@20 PS1, Line 20: #include <tuple> Unnecessary include of 'tuple'. But we should probably add: #include "gen-cpp/CatalogObjects_types.h" http://gerrit.cloudera.org:8080/#/c/23946/1/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: http://gerrit.cloudera.org:8080/#/c/23946/1/common/thrift/CatalogObjects.thrift@667 PS1, Line 667: Murmur3 If we already change things, we should also switch to XXH3. http://gerrit.cloudera.org:8080/#/c/23946/1/fe/src/main/java/org/apache/impala/util/IcebergUtil.java File fe/src/main/java/org/apache/impala/util/IcebergUtil.java: http://gerrit.cloudera.org:8080/#/c/23946/1/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@748 PS1, Line 748: // Divide 16 bytes into two longs. : java.nio.ByteBuffer buffer = java.nio.ByteBuffer.wrap(hashBytes); : long high = buffer.getLong(); : long low = buffer.getLong(); ByteBuffer adds unnecessary allocation overhead. I think we should optimize this method as much as we can because it will get invoked many times. We could use a manual algorithm: long high = 0; for (int i = 0; i < 8; i++) { high = (high << 8) | (hashBytes[i] & 0xFFL); } long low = 0; for (int i = 8; i < 16; i++) { low = (low << 8) | (hashBytes[i] & 0xFFL); } Probably the fastest would be to use OpenHFT and XXH3: import net.openhft.hashing.LongTupleHashFunction; LongTupleHashFunction hasher = LongTupleHashFunction.xx_h3(); long[] result = hasher.hashChars(path); return new Hash128(result[0], result[1]); Out of curiosity, would it be possible to do some micro-benchmarking and share the results? http://gerrit.cloudera.org:8080/#/c/23946/1/fe/src/test/java/org/apache/impala/util/Hash128Test.java File fe/src/test/java/org/apache/impala/util/Hash128Test.java: http://gerrit.cloudera.org:8080/#/c/23946/1/fe/src/test/java/org/apache/impala/util/Hash128Test.java@20 PS1, Line 20: * nit: we prefer non-wildcard imports -- To view, visit http://gerrit.cloudera.org:8080/23946 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie0de793de2434dae3b60c3aa4f87dba203eee3c1 Gerrit-Change-Number: 23946 Gerrit-PatchSet: 1 Gerrit-Owner: Arnab Karmakar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 06 Feb 2026 10:59:59 +0000 Gerrit-HasComments: Yes
