[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder

Bikramjeet Vig (Code Review) Wed, 04 Oct 2017 15:31:13 -0700

Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8034 )


Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
......................................................................


Patch Set 6:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-scanner.h@485
PS6, Line 485:   /// Tracks all the memory allocation used by dictionary in 
DictDecoder.
             :   struct DictMemTrack {
             :     std::unique_ptr<MemTracker> mem_tracker;
             :
             :     DictMemTrack(MemTracker *parent_memtrack):mem_tracker(new 
MemTracker(-1,
             :         "Decoder parquet dict", parent_memtrack, false)) { };
             :
             :     MemTracker*  GetMemTracker() {
             :       return  mem_tracker.get();
             :     }
             :
             :     ~DictMemTrack() {
             :       mem_tracker->CloseAndUnregisterFromParent();
             :     }
             :   };
I dont think we need to add a new struct for this, you can just call 
CloseAndUnregisterFromParent() on the dict_mem_tracker in 
HdfsParquetScanner::Close

Similarly for the writer


http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-table-writer.cc@113
PS6, Line 113:  // Bytes used for memory tracking by dictionary in DictEncoder 
is decremented.
             :   virtual void Cleanup() {
             :     if (dict_encoder_base_ != nullptr) {
             :       dict_encoder_base_->Cleanup();
             :     }
             :   }
you can move this to the BaseColumnWriter::Close() method


http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-table-writer.cc@767
PS6, Line 767:   // Below code will decrement the bytes used for memory 
tracking by dictionary in
             :   // DictEncoder class for each ColumnWriter.
             :   for (int i = 0; i < columns_.size(); i++) {
             :      if (columns_[i] != nullptr) {
             :        columns_[i]->Cleanup();
             :      }
             :   }
similarly you can move this to the HdfsParquetTableWriter::Close() method


http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/util/dict-encoding.h
File be/src/util/dict-encoding.h:

http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/util/dict-encoding.h@138
PS6, Line 138: SizeofDict
I think we can use a more precise name here, SizeOfDict might be confused with 
returning the number of elements in the Dictionary.
Something on the lines of DictByteSize?


http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/util/dict-encoding.h@438
PS6, Line 438: bytes_alloc += sizeof(value);
can you switch to what joe suggested earlier, that is, instead of bytes_alloc, 
using something like the function SizeofDict() that you wrote to update 
ConsumeBytes at the end.
Refer Joe's first comment.



--
To view, visit http://gerrit.cloudera.org:8080/8034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
Gerrit-Change-Number: 8034
Gerrit-PatchSet: 6
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Pranay Singh
Gerrit-Comment-Date: Wed, 04 Oct 2017 22:29:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder

Reply via email to