Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15454 )
Change subject: IMPALA-3766: optionally compress spilled data ...................................................................... Patch Set 9: (9 comments) http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/bufferpool/buffer-pool-counters.h File be/src/runtime/bufferpool/buffer-pool-counters.h: http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/bufferpool/buffer-pool-counters.h@59 PS9, Line 59: RuntimeProfile::Counter* uncompressed_bytes_written; > Do we need some tests for the new counters as well? I added tests for the bytes written counters. I didn't do that for the timers since the SCOPED_TIMER macro makes it fairly trivial - I manually inspected a profile. http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/bufferpool/buffer-pool-counters.h@59 PS9, Line 59: RuntimeProfile::Counter* uncompressed_bytes_written; > I think adding the timers makes a lot of sense. I moved the uncompressed bytes to the backend level and made sure there were compression and encryption timers to both. The concern about unaccounted CPU time in an operator was the deciding factor for the second one. http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/tmp-file-mgr-internal.h File be/src/runtime/tmp-file-mgr-internal.h: http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/tmp-file-mgr-internal.h@94 PS11, Line 94: This is also the : /// offset of the next chunk to allocate in the file. > maybe move this to the beginning of the comment block? I kinda think now that this sentence doesn't add much, so i deleted it instead of moving it. http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/tmp-file-mgr.cc@586 PS11, Line 586: DCHECK(io_mgr_buffer->eosr()); : DCHECK_LE(io_mgr_buffer->len(), read_buffer.len()); : if (io_mgr_ > at this point, the compressed_.buffer() should be a nullptr, right? should I'm not sure what you mean, it has to be non-NULL if it's reading a compressed range. I guess read_buffer does need to be non-null, so I added a DCHECK for that http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/tmp-file-mgr.cc@644 PS11, Line 644: return status; : } : > not sure I understand what this method is for, but why don't we need to dec This method is used if we wrote out a page from memory to disk, but didn't evict the page, and now are wanting to pin the page again. In that case we don't need to read the data from disk,but we *do* need to decrypt it if the buffer was encrypted in-place. With compression all the compreession and encryption was done in the separate buffer and the original one was unmodified. Updated the comment to be less vague. http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/tmp-file-mgr.cc@779 PS11, Line 779: write_range_->SetData(buffer_to_write.data(), buffer_to_write.len()); > if this fails, do we still need to call FreeCompressedBuffer()? That's a good point. Adding FreeCompressedbuffer() on all the error paths was getting error-prone, so I switched to using a scoped exit trigger. http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/tmp-file-mgr.cc@807 PS11, Line 807: > might be cleaner to have return true/false - true if the compression succee Done http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/tmp-file-mgr.cc@833 PS11, Line 833: file_ = file; This was wrong since we are now allocating MaxOutputLen() - in an earlier patch it allocated a slightly smaller buffer. http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/tmp-file-mgr.cc@835 PS11, Line 835: Status status = io_ctx->AddWriteRange(write_range_.get()); > is it possible for ProcessBlock to fail, but still write some data to the c That's a good point. I just switched to calling .Release() to be symmetrical with TryAllocate(), I don't think the indirection helped here. I'm not sure I can test this code path. The only case where I think ProcessBlock() could fail on the compression path is if the output buffer is too small, which we avoid by allocating MaxOutputLen(). So I'm not sure if this is actually reachable unless there's a weird internal error in a compressor -- To view, visit http://gerrit.cloudera.org:8080/15454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659 Gerrit-Change-Number: 15454 Gerrit-PatchSet: 9 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 26 Mar 2020 00:56:52 +0000 Gerrit-HasComments: Yes
