Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16534 )

Change subject: IMPALA-10205: Replace MD5 with Murmur3 for generating datafile 
path hash
......................................................................


Patch Set 1:

I think we should just figure out how to get rid of the md5 use here. I took a 
look and I'm really not seeing a benefit to the current approach compared to 
using the path directly and choosing a better thrift structure.

I think if we use the path as the key in the java map, there should be no space 
overhead - it looks like DataFile.path() will just return a reference to the 
path String in DataFile - there's no copy or anything.

Then for TIcebergTable we don't need to represent it as a map, we can just use 
a list<THdfsFileDesc> and construct the java map in loadFileDescFromThrift


--
To view, visit http://gerrit.cloudera.org:8080/16534
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7c805f2fdf0cf5a69738579c7e55f4bd047ed59
Gerrit-Change-Number: 16534
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: wangsheng <[email protected]>
Gerrit-Comment-Date: Sat, 03 Oct 2020 00:27:04 +0000
Gerrit-HasComments: No

Reply via email to