Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5085: large rows in BufferedTupleStreamV2
......................................................................


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6638/5/be/src/runtime/buffered-tuple-stream-v2.cc
File be/src/runtime/buffered-tuple-stream-v2.cc:

PS5, Line 118: (a tricky edge case).
> how so?
Done


Line 340:   }
> this case occurs because we eagerly create write pages. Did you consider ma
The bigger problem is that we'd then be deferring creation of the page until 
after PrepareForWrite() or PrepareForReadWrite(), which would break all the 
client code assumes that PrepareFor*() grabs the memory required it iterate 
over an unpinned stram.

We could do something like have a ReservationTracker embedded in the stream 
where we save the reservation until needed, but it didn't seem like a net 
simplification to me.


Line 378:       && 
!buffer_pool_client_->IncreaseReservationToFit(read_page_->len())) {
> this comment seems misplaced now, seems better attached to 390.
Done


PS5, Line 380: an the default page length, then unpinning the previous page may
> I don't understand why we do that but then the comment below says "The clie
Yeah that's true. It would make sense if appending a larger row was "best 
effort" instead of meant to be guaranteed to succeed (see my other response in 
the .h).


PS5, Line 421: Status::OK();
> why is this needed? won't we have at least one page? oh, I guess in the cas
Yeah exactly.


http://gerrit.cloudera.org:8080/#/c/6638/3/be/src/runtime/buffered-tuple-stream-v2.h
File be/src/runtime/buffered-tuple-stream-v2.h:

PS3, Line 212: max_page_len
> Yeah, I'm more just asking for the motivation. I guess it's a guard rail to
Oh, you mean why not just make a best-effort attempt to allocate large pages in 
the stream and depend on the client to ensure enough reservation is available 
for the maximum page size? That's easy enough to implement.

My original thinking was that this would be more deterministic - if we 
encounter a too-large row then it will deterministically fail. Without a max we 
might succeed sometimes and fail other times, depending on how much additional 
reservation the client grabs at runtime. 

That may not matter too much since success/failure is not totally deterministic 
system-wide regardless since the large rows could be filtered out by runtime 
filters or limits. Whether we append probe side rows to a stream is also 
non-deterministic because it depends on if and when the join spills.

We could change this so that appending a larger row is a "best-effort" thing 
rather than guaranteed. The main downside seems to be what you mentioned - 
there is no guard rail to prevent one stream from using unlimited memory. I'd 
expect that to be handled correctly - at some point reading or writing a stream 
should fail with an error status if another stream used up too much memory - 
but it does increase the state space more.


PS3, Line 254: stream could not increase the reservation
             :   ///   sufficiently
> By "above" i meant lines 87-88: 
Ah ok. Yeah I think this could be made consistent once we decide whether it's 
best-effort (depends on whether the callee could get the reservation, but 
caller can ensure success if that is true) or requires (callee will fail on 
unpinned streams if the reservation is not already available).


-- 
To view, visit http://gerrit.cloudera.org:8080/6638
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2861c58efa7bc1aeaa5b7e2f043c97cb3985c8f5
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Jim Apple <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to