Tim Armstrong has posted comments on this change. Change subject: IMPALA-5085: large rows in BufferedTupleStreamV2 ......................................................................
Patch Set 10: (23 comments) http://gerrit.cloudera.org:8080/#/c/6638/10/be/src/runtime/buffered-tuple-stream-v2.cc File be/src/runtime/buffered-tuple-stream-v2.cc: PS10, Line 120: Exceptions are the page we're currently writing > why is that still an exception now that we have the saved reservation? You're right. For some reason I thought I fixed this code but I obviously didn't. PS10, Line 271: ExpectedPinCount > Now that the complex logic shifts to reservations, should this become "bool This seemed to work out a bit cleaner for the validations (i.e. we can just assert the pin count equals this, rather than asserting is_pinned == NeedPin() && pin_count == 1. PS10, Line 340: has_write_iterator > why is that? should comment say: Need reservation if there are no pages cur Done Line 409: - write_page_reservation_to_reclaim)) { > this can only fail for large pages, right? if so, how about clarifying with Or for pinned streams. I added the DCHECK. Line 424: // free up reservation for the next write page, so do it first. > then why does this not have to happen before the IncreaseReservationToFit() It's not necessary now that we factor that into the IncreaseReservationToFit() calculation. Removed the comment. PS10, Line 505: deleting or unpinning the previous page > this seems like it needs updating. Done PS10, Line 516: (!pinned_ || pages_.size() == 1) && has_read_write_page() > hmm, is this not expressible by NeedWriteReservation()? Done. It's a little clunky but it is probably clearer. PS10, Line 548: InvalidateReadIterator > why do that? don't we still need the saved read reservation? In case there was already a read iterator active. This just resets things to a clear state then sets up the new iterator from scratch. Line 566: read_end_ptr_ = nullptr; > why doesn't this case need to save reservation? It's handled in PrepareForReadWrite(). It's unclear whether it's cleaner to handle this here or in the caller. The current approach is slightly less code. PS10, Line 713: read_page_ == pages_.end() > why do we do NextReadPage() if we're already at the end? pages_.end() is just the general-purpose invalid iterator value. The added case handles moving to the first page of a read/write stream. PS10, Line 866: so : // we can just use AllocateRowSlow() to do the work of advancing the page. > i don't understand why this follows from the first part of the sentence. I don't think it was very helpful. Shortened the comment. PS10, Line 893: pinned_ > shouldn't that be expressed in terms of NeedWriteReservation()? Done http://gerrit.cloudera.org:8080/#/c/6638/10/be/src/runtime/buffered-tuple-stream-v2.h File be/src/runtime/buffered-tuple-stream-v2.h: PS10, Line 77: In the unpinned mode, only : /// pages currently being read and written are pinned and other pages are unpinned and : /// therefore do not use the client's reservation and can be spilled to disk. : /// > it seems like this sentence should be combined with the next paragraph, sin Merged the paragraphs. PS10, Line 89: caller > client? Done Line 278: /// but std::function always makes a heap allocation. > explain what this callback should do. Done PS10, Line 281: 'fixed_size' and variable : /// length data 'varlen_size' > update Done Line 283: /// start of the allocated space and returns true. Otherwise returns false. > does this have the same three outcomes as AddRow? Done PS10, Line 284: AllocateRow > At this point the "Add" and "Allocate" row terminology seems historical. Le AddRow() still seems reasonable. AddRowCustom(), AddRowUnsafe()? Line 484: /// * there is only one pinned page in the stream and it is already pinned for reading. > this comment is useful, but it'd also be helpful to explain what these thre Done PS10, Line 535: and at the byte before 'data_end'. > what does that mean? was missing "ending" PS10, Line 536: updates *data_end > data_end doesn't have enough indirection. is that suppose to say '*data'? Done Line 582: void InvalidateWriteIterator(); > it's a bit hard to understand how these "invalidate" methods are different Folded ResetReadPage() into InvalidateReadIterator(). Updated the ResetWritePage() comment to reflect the logical position of the iterator after it returns. http://gerrit.cloudera.org:8080/#/c/6638/10/be/src/runtime/buffered-tuple-stream-v2.inline.h File be/src/runtime/buffered-tuple-stream-v2.inline.h: Line 34: inline bool BufferedTupleStreamV2::AllocateRow( > why is it that AllocateRow() fast path is inlined but AddRow() is not? AddRow() calls into the interpreted DeepCopy() anyway on the fast path so there's not really a benefit, whereas AllocateRow() doesn't do as much work and benefits from inlining write_fn() -- To view, visit http://gerrit.cloudera.org:8080/6638 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
