Tim Armstrong has posted comments on this change. Change subject: IMPALA-5085: large rows in BufferedTupleStreamV2 ......................................................................
Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/6638/5/be/src/runtime/buffered-tuple-stream-v2.cc File be/src/runtime/buffered-tuple-stream-v2.cc: PS5, Line 118: (a tricky edge case). > how so? Done Line 340: } > this case occurs because we eagerly create write pages. Did you consider ma The bigger problem is that we'd then be deferring creation of the page until after PrepareForWrite() or PrepareForReadWrite(), which would break all the client code assumes that PrepareFor*() grabs the memory required it iterate over an unpinned stram. We could do something like have a ReservationTracker embedded in the stream where we save the reservation until needed, but it didn't seem like a net simplification to me. Line 378: && !buffer_pool_client_->IncreaseReservationToFit(read_page_->len())) { > this comment seems misplaced now, seems better attached to 390. Done PS5, Line 380: an the default page length, then unpinning the previous page may > I don't understand why we do that but then the comment below says "The clie Yeah that's true. It would make sense if appending a larger row was "best effort" instead of meant to be guaranteed to succeed (see my other response in the .h). PS5, Line 421: Status::OK(); > why is this needed? won't we have at least one page? oh, I guess in the cas Yeah exactly. http://gerrit.cloudera.org:8080/#/c/6638/3/be/src/runtime/buffered-tuple-stream-v2.h File be/src/runtime/buffered-tuple-stream-v2.h: PS3, Line 212: max_page_len > Yeah, I'm more just asking for the motivation. I guess it's a guard rail to Oh, you mean why not just make a best-effort attempt to allocate large pages in the stream and depend on the client to ensure enough reservation is available for the maximum page size? That's easy enough to implement. My original thinking was that this would be more deterministic - if we encounter a too-large row then it will deterministically fail. Without a max we might succeed sometimes and fail other times, depending on how much additional reservation the client grabs at runtime. That may not matter too much since success/failure is not totally deterministic system-wide regardless since the large rows could be filtered out by runtime filters or limits. Whether we append probe side rows to a stream is also non-deterministic because it depends on if and when the join spills. We could change this so that appending a larger row is a "best-effort" thing rather than guaranteed. The main downside seems to be what you mentioned - there is no guard rail to prevent one stream from using unlimited memory. I'd expect that to be handled correctly - at some point reading or writing a stream should fail with an error status if another stream used up too much memory - but it does increase the state space more. PS3, Line 254: stream could not increase the reservation : /// sufficiently > By "above" i meant lines 87-88: Ah ok. Yeah I think this could be made consistent once we decide whether it's best-effort (depends on whether the callee could get the reservation, but caller can ensure success if that is true) or requires (callee will fail on unpinned streams if the reservation is not already available). -- To view, visit http://gerrit.cloudera.org:8080/6638 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
