Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15454 )
Change subject: IMPALA-3766: optionally compress spilled data ...................................................................... Patch Set 9: (5 comments) http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/bufferpool/buffer-pool-counters.h File be/src/runtime/bufferpool/buffer-pool-counters.h: http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/bufferpool/buffer-pool-counters.h@59 PS9, Line 59: RuntimeProfile::Counter* uncompressed_bytes_written; > why add this here vs the TmpFileMgr? That's a good point. This is a per-operator counter versus a per-backend counter. Maybe it is more useful at the backend level, I'm not sure that understanding the compressibility of the data per-operator is all that useful. I'm also realising that it would be useful to have encryption and compression time per operator, because those contribute to the CPU time of that operator. So maybe I should do this: * move uncompress bytes to just being at the backend level * add compression time to both levels * add encryption time to the operator level (it's already at the backend level) http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/tmp-file-mgr-internal.h File be/src/runtime/tmp-file-mgr-internal.h: http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/tmp-file-mgr-internal.h@96 PS9, Line 96: int64_t bytes_allocated_ = 0; > would it be clearer to rename this to 'file_write_offset_' I went with allocation_offset_ http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/tmp-file-mgr.h File be/src/runtime/tmp-file-mgr.h: http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/tmp-file-mgr.h@157 PS9, Line 157: static constexpr int64_t HOLE_PUNCH_BLOCK_SIZE = 4096; > nit: docs on what this is used for and why it is set to 4K Done http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/tmp-file-mgr.cc@61 PS9, Line 61: "most cases this should be used in conjunction with --disk_spill_punch_holes=true " : "to maximize the space saved."); > should we just enforce this? e.g. compression can only be used if hole punc Yeah I think compression + no hole punching is not a particularly useful configuration and a potential way that people could misconfigure it. http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/tmp-file-mgr.cc@63 PS9, Line 63: disk_spill_compression_buffer_limit_bytes > what happens if compressing a buffer requires more than this many bytes? Updated the comment to explain (I think that's what you were getting at, right?). I also reduced the limit a bit to be more conservative. -- To view, visit http://gerrit.cloudera.org:8080/15454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659 Gerrit-Change-Number: 15454 Gerrit-PatchSet: 9 Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Mon, 23 Mar 2020 23:49:33 +0000 Gerrit-HasComments: Yes