wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16534 )

Change subject: IMPALA-10205: Replace MD5 with Murmur3 for generating datafile 
path hash
......................................................................


Patch Set 3:

> I think we should just figure out how to get rid of the md5 use
 > here. I took a look and I'm really not seeing a benefit to the
 > current approach compared to using the path directly and choosing a
 > better thrift structure.
 >
 > I think if we use the path as the key in the java map, there should
 > be no space overhead - it looks like DataFile.path() will just
 > return a reference to the path String in DataFile - there's no copy
 > or anything.
 >
 > Then for TIcebergTable we don't need to represent it as a map, we
 > can just use a list<THdfsFileDesc> and construct the java map in
 > loadFileDescFromThrift

Yes, I agree with Tim. We just reserve a list<THdfsFileDesc> in thrift, and in 
loadFileDescFromThrift method, we can use FileDescriptor.getRelativePath() as 
map key. My original design is just want to use data file path to filter when 
executing query. If we can construct this key in code instead of thrift member, 
we can remove this from thrift struct.


--
To view, visit http://gerrit.cloudera.org:8080/16534
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7c805f2fdf0cf5a69738579c7e55f4bd047ed59
Gerrit-Change-Number: 16534
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: wangsheng <[email protected]>
Gerrit-Comment-Date: Tue, 06 Oct 2020 02:34:13 +0000
Gerrit-HasComments: No

Reply via email to