Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16534 )

Change subject: IMPALA-10205: Replace MD5 with Murmur3 for generating datafile 
path hash
......................................................................


Patch Set 3:

Looking at THdfsFileDesc it might be not possible to avoid memory overhead at 
the coordinator-side.

The file descriptors only store the relative path, but we need the absolute 
path to filter Iceberg data files, therefore we need to construct the paths 
from <table location> + <file desc relative path>.

So maybe we should just stick with Murmur3?

I've spotted another issue with our FileDescriptor class. For getRelativePath() 
it uses the underlying flat buffer to return a value:

https://github.com/apache/impala/blob/d453d52aadcbd158147b906813b22eb2944ac90b/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L231

Since the flat buffer doesn't contain a Java String object, whenever we issue 
fileDesc.getRelativePath() it returns a newly constructed java String each time.


--
To view, visit http://gerrit.cloudera.org:8080/16534
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7c805f2fdf0cf5a69738579c7e55f4bd047ed59
Gerrit-Change-Number: 16534
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: wangsheng <[email protected]>
Gerrit-Comment-Date: Tue, 06 Oct 2020 08:23:46 +0000
Gerrit-HasComments: No

Reply via email to