wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16534 )
Change subject: IMPALA-10205: Replace MD5 with Murmur3 for generating datafile path hash ...................................................................... Patch Set 3: > I think we should just figure out how to get rid of the md5 use > here. I took a look and I'm really not seeing a benefit to the > current approach compared to using the path directly and choosing a > better thrift structure. > > I think if we use the path as the key in the java map, there should > be no space overhead - it looks like DataFile.path() will just > return a reference to the path String in DataFile - there's no copy > or anything. > > Then for TIcebergTable we don't need to represent it as a map, we > can just use a list<THdfsFileDesc> and construct the java map in > loadFileDescFromThrift Yes, I agree with Tim. We just reserve a list<THdfsFileDesc> in thrift, and in loadFileDescFromThrift method, we can use FileDescriptor.getRelativePath() as map key. My original design is just want to use data file path to filter when executing query. If we can construct this key in code instead of thrift member, we can remove this from thrift struct. -- To view, visit http://gerrit.cloudera.org:8080/16534 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7c805f2fdf0cf5a69738579c7e55f4bd047ed59 Gerrit-Change-Number: 16534 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: wangsheng <[email protected]> Gerrit-Comment-Date: Tue, 06 Oct 2020 02:34:13 +0000 Gerrit-HasComments: No
