Dan Hecht has posted comments on this change. Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool ......................................................................
Patch Set 8: (20 comments) http://gerrit.cloudera.org:8080/#/c/5811/8/be/src/runtime/buffered-tuple-stream-v2.cc File be/src/runtime/buffered-tuple-stream-v2.cc: PS8, Line 108: if iff, just to be super clear. Line 116: } fine with me to keep, but isn't this the same as line 109? Line 141: ss << &*read_page_; will it be possible to identify which page in the list printed below this is? Line 166: DCHECK(read_page_ == pages_.end()); DCHECK_EQ x2 PS8, Line 219: If the stream is pinned, the previous write page will remain pinned : // with pin count 1 as a read page. If a read/write stream is pinned, the write : // page has pin count 2, so we need to unpin once these two sentences seem to both be talking about pinned streams, but i'm not sure what they are trying to conclude. i also don't understand this if-stmt that follows. Why isn't it simply: if (write_page_ != nullptr) UnpinPage(write_page_); ? i.e. we need to unpin *for write* the page regardless of the state of the read-iterator, no? i.e. can't we separate out reasoning of the read and write iterators? PS8, Line 232: read_page_ != pages_.end() what does this condition mean? why isn't it just: if the stream is pinned for reading, need to take a pin count for both read and write, otherwise need a pin count for only write? Line 241: RETURN_IF_ERROR(buffer_pool_->Pin(buffer_pool_client_, &new_page.handle)); does this purposely not go through PinPage()? PS8, Line 250: (double_pin ? 2 : 1) i guess because of this. maybe move this to be after the CreatePage() call and just add 1 in there and then let PinPage() do its job? Line 348: *pinned = true; why was this needed? PS8, Line 356: We need to double-pin the current write page if we also have a read iterator. why? or maybe you mean just that we need to take a pin-count for reading even if it's already pinned for write? if that's the case, I think we should just talk about "read pinning" and "write pinning" separately (and this routine shouldn't have to muck with write pinning, right?) PS8, Line 358: write_page_ == &page && read_page_ == pages_.end() why? i thought we now do take a separate pin count for read and for write? Line 387: if (pinned_) UnpinPage(&page); do we allow you to call UnpinStream() on an already unpinned stream? why? PS8, Line 388: continue i think this would be clearer as: if (is_write_page || is_read_page) { } else { } rather than the "continue" flow. http://gerrit.cloudera.org:8080/#/c/5811/8/be/src/runtime/buffered-tuple-stream-v2.h File be/src/runtime/buffered-tuple-stream-v2.h: PS8, Line 68: . maybe add: "for reading" PS8, Line 69: access read access? Line 71: /// unpinned and therefore do not use the client's reservation and can be spilled to disk. comments in this paragraph are meant to further clarify that pinned vs. unpinned stream really only applies to the read-iterator. PS8, Line 75: event even PS8, Line 319: bytes_pinned i guess should be BytesPinned() since not a simple accessor PS8, Line 385: including any pages already deleted in 'delete_on_read' : /// mode. that's not intuitive. is there a good reason why we do that rather than make it always be the size of the stream? and if so, seems like it should be called highwater_byte_size or something. PS8, Line 392: num_rows_returned_ rows_returned_? (or rename that member, which might be better). -- To view, visit http://gerrit.cloudera.org:8080/5811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
