Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16534 )
Change subject: IMPALA-10205: Replace MD5 with Murmur3 for generating datafile path hash ...................................................................... Patch Set 3: Looking at THdfsFileDesc it might be not possible to avoid memory overhead at the coordinator-side. The file descriptors only store the relative path, but we need the absolute path to filter Iceberg data files, therefore we need to construct the paths from <table location> + <file desc relative path>. So maybe we should just stick with Murmur3? I've spotted another issue with our FileDescriptor class. For getRelativePath() it uses the underlying flat buffer to return a value: https://github.com/apache/impala/blob/d453d52aadcbd158147b906813b22eb2944ac90b/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L231 Since the flat buffer doesn't contain a Java String object, whenever we issue fileDesc.getRelativePath() it returns a newly constructed java String each time. -- To view, visit http://gerrit.cloudera.org:8080/16534 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7c805f2fdf0cf5a69738579c7e55f4bd047ed59 Gerrit-Change-Number: 16534 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: wangsheng <[email protected]> Gerrit-Comment-Date: Tue, 06 Oct 2020 08:23:46 +0000 Gerrit-HasComments: No
