Joe McDonnell has posted comments on this change. Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder ......................................................................
Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/8034/2/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: PS2, Line 213: dict_decoder_(parent_->dictionary_pool_.get()), We should think about having a separate MemPool for parts of the dictionary that will not be referenced by the RowBatch and can be freed at the end of the row group. Since some datatypes are copied by value rather than pointing into the dictionary, this might be used for those as well. http://gerrit.cloudera.org:8080/#/c/8034/2/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: PS2, Line 229: memcpy(val_ptr, dict_[index], sizeof(T)); The memcpy is doing a dereference here and is not different from *val_ptr = *dict_[index]; PS2, Line 238: std::vector<T*> dict_; Using a T* means that GetValue() must do a dereference to obtain the value. This is a very performance sensitive codepath, so we need to avoid this extra dereference. This should maintain the value directly in the vector. -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Pranay Singh Gerrit-HasComments: Yes
