Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15454 )

Change subject: IMPALA-3766: optionally compress spilled data
......................................................................


Patch Set 9:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/bufferpool/buffer-pool-counters.h
File be/src/runtime/bufferpool/buffer-pool-counters.h:

http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/bufferpool/buffer-pool-counters.h@59
PS9, Line 59:   RuntimeProfile::Counter* uncompressed_bytes_written;
> Do we need some tests for the new counters as well?
I added tests for the bytes written counters. I didn't do that for the timers 
since the SCOPED_TIMER macro makes it fairly trivial - I manually inspected a 
profile.


http://gerrit.cloudera.org:8080/#/c/15454/9/be/src/runtime/bufferpool/buffer-pool-counters.h@59
PS9, Line 59:   RuntimeProfile::Counter* uncompressed_bytes_written;
> I think adding the timers makes a lot of sense.
I moved the uncompressed bytes to the backend level and made sure there were 
compression and encryption timers to both. The concern about unaccounted CPU 
time in an operator was the deciding factor for the second one.


http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/tmp-file-mgr-internal.h
File be/src/runtime/tmp-file-mgr-internal.h:

http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/tmp-file-mgr-internal.h@94
PS11, Line 94: This is also the
             :   /// offset of the next chunk to allocate in the file.
> maybe move this to the beginning of the comment block?
I kinda think now that this sentence doesn't add much, so i deleted it instead 
of moving it.


http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/tmp-file-mgr.cc@586
PS11, Line 586:  DCHECK(io_mgr_buffer->eosr());
              :   DCHECK_LE(io_mgr_buffer->len(), read_buffer.len());
              :   if (io_mgr_
> at this point, the compressed_.buffer() should be a nullptr, right? should
I'm not sure what you mean, it has to be non-NULL if it's reading a compressed 
range. I guess read_buffer does need to be non-null, so I added a DCHECK for 
that


http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/tmp-file-mgr.cc@644
PS11, Line 644: return status;
              : }
              :
> not sure I understand what this method is for, but why don't we need to dec
This method is used if we wrote out a page from memory to disk, but didn't 
evict the page, and now are wanting to pin the page again. In that case we 
don't need to read the data from disk,but we *do* need to decrypt it if the 
buffer was encrypted in-place. With compression all the compreession and 
encryption was done in the separate buffer and the original one was unmodified.

Updated the comment to be less vague.


http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/tmp-file-mgr.cc@779
PS11, Line 779:   write_range_->SetData(buffer_to_write.data(), 
buffer_to_write.len());
> if this fails, do we still need to call FreeCompressedBuffer()?
That's a good point. Adding FreeCompressedbuffer() on all the error paths was 
getting error-prone, so I switched to using a scoped exit trigger.


http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/tmp-file-mgr.cc@807
PS11, Line 807:
> might be cleaner to have return true/false - true if the compression succee
Done


http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/tmp-file-mgr.cc@833
PS11, Line 833:   file_ = file;
This was wrong since we are now allocating MaxOutputLen() - in an earlier patch 
it allocated a slightly smaller buffer.


http://gerrit.cloudera.org:8080/#/c/15454/11/be/src/runtime/tmp-file-mgr.cc@835
PS11, Line 835:   Status status = io_ctx->AddWriteRange(write_range_.get());
> is it possible for ProcessBlock to fail, but still write some data to the c
That's a good point. I just switched to calling .Release() to be symmetrical 
with TryAllocate(), I don't think the indirection helped here.

I'm not sure I can test this code path. The only case where I think 
ProcessBlock() could fail on the compression path is if the output buffer is 
too small, which we avoid by allocating MaxOutputLen(). So I'm not sure if this 
is actually reachable unless there's a weird internal error in a compressor



--
To view, visit http://gerrit.cloudera.org:8080/15454
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9c08ff9504097f0fee8c32316c5c150136abe659
Gerrit-Change-Number: 15454
Gerrit-PatchSet: 9
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Thu, 26 Mar 2020 00:56:52 +0000
Gerrit-HasComments: Yes

Reply via email to