Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8034 )

Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
......................................................................


Patch Set 6:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-scanner.h@485
PS6, Line 485:   /// Tracks all the memory allocation used by dictionary in 
DictDecoder.
             :   struct DictMemTrack {
             :     std::unique_ptr<MemTracker> mem_tracker;
             :
             :     DictMemTrack(MemTracker *parent_memtrack):mem_tracker(new 
MemTracker(-1,
             :         "Decoder parquet dict", parent_memtrack, false)) { };
             :
             :     MemTracker*  GetMemTracker() {
             :       return  mem_tracker.get();
             :     }
             :
             :     ~DictMemTrack() {
             :       mem_tracker->CloseAndUnregisterFromParent();
             :     }
             :   };
> I dont think we need to add a new struct for this, you can just call CloseA
I had added a new structure to have the creation /destruction of memtracker in 
one place

and also the CloseAndUnregisterFromParent() has to be called after
releasing all bytes owned by MemTracker in the DictDecoder class (which happens 
in it's destructor), since the destructor of member variables are called later 
than the owner making a separate struct/class for DictMemTrack  solved the 
problem.


http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-table-writer.cc@113
PS6, Line 113:  // Bytes used for memory tracking by dictionary in DictEncoder 
is decremented.
             :   virtual void Cleanup() {
             :     if (dict_encoder_base_ != nullptr) {
             :       dict_encoder_base_->Cleanup();
             :     }
             :   }
> you can move this to the BaseColumnWriter::Close() method
This can't be done because I'm not tracking the bytes at the BaseColumnWriter


http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-table-writer.cc@767
PS6, Line 767:   // Below code will decrement the bytes used for memory 
tracking by dictionary in
             :   // DictEncoder class for each ColumnWriter.
             :   for (int i = 0; i < columns_.size(); i++) {
             :      if (columns_[i] != nullptr) {
             :        columns_[i]->Cleanup();
             :      }
             :   }
> similarly you can move this to the HdfsParquetTableWriter::Close() method
I'll try moving the functionality in Close() for HdfsParquetTableWriter/Reader


http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/util/dict-encoding.h
File be/src/util/dict-encoding.h:

http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/util/dict-encoding.h@138
PS6, Line 138: SizeofDict
> I think we can use a more precise name here, SizeOfDict might be confused w
I've changed it to be DictByteSize


http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/util/dict-encoding.h@438
PS6, Line 438: bytes_alloc += sizeof(value);
> can you switch to what joe suggested earlier, that is, instead of bytes_all
This looks fine what is the issue with this?



--
To view, visit http://gerrit.cloudera.org:8080/8034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299
Gerrit-Change-Number: 8034
Gerrit-PatchSet: 6
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Pranay Singh
Gerrit-Comment-Date: Thu, 05 Oct 2017 01:58:59 +0000
Gerrit-HasComments: Yes

Reply via email to