Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15454 )

Change subject: IMPALA-3766: optionally compress spilled data
......................................................................


Patch Set 9:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/bufferpool/buffer-pool-counters.h
File be/src/runtime/bufferpool/buffer-pool-counters.h:

http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/bufferpool/buffer-pool-counters.h@59
PS9, Line 59:   RuntimeProfile::Counter* uncompressed_bytes_written;
> why add this here vs the TmpFileMgr?
That's a good point. This is a per-operator counter versus a per-backend 
counter. Maybe it is more useful at the backend level, I'm not sure that 
understanding the compressibility of the data per-operator is all that useful.

I'm also realising that it would be useful to have encryption and compression 
time per operator, because those contribute to the CPU time of that operator.

So maybe I should do this:
* move uncompress bytes to just being at the backend level
* add compression time to both levels
* add encryption time to the operator level (it's already at the backend level)


http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/tmp-file-mgr-internal.h
File be/src/runtime/tmp-file-mgr-internal.h:

http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/tmp-file-mgr-internal.h@96
PS9, Line 96:   int64_t bytes_allocated_ = 0;
> would it be clearer to rename this to 'file_write_offset_'
I went with allocation_offset_


http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/tmp-file-mgr.h
File be/src/runtime/tmp-file-mgr.h:

http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/tmp-file-mgr.h@157
PS9, Line 157:   static constexpr int64_t HOLE_PUNCH_BLOCK_SIZE = 4096;
> nit: docs on what this is used for and why it is set to 4K
Done


http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/tmp-file-mgr.cc@61
PS9, Line 61:     "most cases this should be used in conjunction with 
--disk_spill_punch_holes=true "
            :     "to maximize the space saved.");
> should we just enforce this? e.g. compression can only be used if hole punc
Yeah I think compression + no hole punching is not a particularly useful 
configuration and a potential way that people could misconfigure it.


http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/tmp-file-mgr.cc@63
PS9, Line 63: disk_spill_compression_buffer_limit_bytes
> what happens if compressing a buffer requires more than this many bytes?
Updated the comment to explain (I think that's what you were getting at, 
right?). I also reduced the limit a bit to be more conservative.



--
To view, visit http://gerrit.cloudera.org:8080/15454
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659
Gerrit-Change-Number: 15454
Gerrit-PatchSet: 9
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Mon, 23 Mar 2020 23:49:33 +0000
Gerrit-HasComments: Yes

Reply via email to