Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4674: Part 2: port backend exec to BufferPool
......................................................................


Patch Set 23:

(27 comments)

http://gerrit.cloudera.org:8080/#/c/5801/23/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

PS23, Line 194: !buffer_pool_client_.is_registered()
> I guess that's needed for subplans?
Yeah exactly. It's a little ugly but I couldn't think of a cleaner way to deal 
with this.


http://gerrit.cloudera.org:8080/#/c/5801/23/be/src/exec/analytic-eval-node.h
File be/src/exec/analytic-eval-node.h:

PS23, Line 336: reservation
> the stream will try to grow the reservation first, right?
Done


http://gerrit.cloudera.org:8080/#/c/5801/23/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

Line 209:         &buffer_pool_client_, resource_profile_.min_reservation);
> do we somewhere inside there dcheck that the exec-node has at least min_res
No. I'm not sure that it's a useful check either - if the ExecNode 
implementation wants to give up reservation earlier that seems OK.


Line 226:       resource_profile_.spillable_buffer_size < 
buffer_pool->min_buffer_len()) {
> when can that happen?
It's possible if the coordinator has a different configuration from the 
backends. It seems unlikely that anyone would configure it in that way but who 
knows.


Line 233:   // to get some reservation.
> I'm not sure what that comment is saying.
Yeah it was garbled. I don't think it's needed.


PS23, Line 240: if (resource_profile_.min_reservation > 0) {
> maybe delete that to exercise this path unconditionally
Done


http://gerrit.cloudera.org:8080/#/c/5801/23/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

PS23, Line 948: 
> is that solved or was it not a problem?
This was IMPALA-3899, which is fixed. This comment just didn't get updated.


PS23, Line 532: Either a pointer to the single tuple of this row 
> where does the tuple live in that case?
Reworded the comment to be clearer (hopefully).


PS23, Line 605: success
> that's kind of confusing given the OK on status means success also. Maybe m
Yeah got_memory is clearer.


PS23, Line 617:   
> blank
Done


PS23, Line 625: Status* status
> do we need to pass that indirectly for codegen/perf still? i thought we mad
I tried to do this a while back with BTS::AddRow() but it was a can of worms.

The problem was mainly codegen time, IIRC the runtime difference wasn't 
significant. In a lot of cases the compiler couldn't infer that the destructor 
for the automatic Status variable was dead code so the IR ends up with lots of 
unnecessary branches from the Status destructors.


PS23, Line 688: success
> same comment
Done


PS23, Line 854: Status* status
> same question about status perf. otherwise, seems better to pass HtData**
See above comment.


PS23, Line 862: success
> same comment
changed to got_memory


PS23, Line 879: status
> same comment vs DuplicateNode**
See above comment.


PS23, Line 895: tatus* status)
> this is opposite all the "bool* success" interfaces. reason for that?
Same as the other ones - this gets inlined into the IR and the automatic Status 
causes problems.


PS23, Line 923: stores_tuples_
> this could probably use a comment (the comment above seems to apply to the 
Yeah Tuple* is significantly faster at least until GetTupleRow() is codegen'd.

Added a comment.


http://gerrit.cloudera.org:8080/#/c/5801/23/be/src/exec/partitioned-aggregation-node-ir.cc
File be/src/exec/partitioned-aggregation-node-ir.cc:

Line 224:   if (hash_tbl == nullptr) return false; // Hash table was not 
created - pass through.
> why does that happen now? or is it a bug fix?
This was required so that preaggs can execute with zero reservation.

Before we "got away" with allocating small hash tables and streams because the 
hash table and small buffer memory wasn't counted against the reservations.


http://gerrit.cloudera.org:8080/#/c/5801/23/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

PS23, Line 88: page
> should we express that as min_buffer_size instead?  maybe easy to understan
Yeah that would be more consistent.

There was already a min_buffer_size flag in disk-io-mgr so I combined the two - 
they'll be merged anyway when we switch the io-mgr over to the buffer pool.


http://gerrit.cloudera.org:8080/#/c/5801/23/be/src/runtime/initial-reservations.cc
File be/src/runtime/initial-reservations.cc:

Line 69:   DCHECK(success);
> Explain why success is expected (planner computation).
Done


http://gerrit.cloudera.org:8080/#/c/5801/23/be/src/runtime/initial-reservations.h
File be/src/runtime/initial-reservations.h:

PS23, Line 40:  This creates trackers for initial reservations under those.
> This seems similar to a SubReservation, no?
Right, they're both wrappers around ReservationTrackers but a SubReservation is 
associated with a BufferPool Client, and the operations are a bit different.


PS23, Line 47: Claim
> claim it from where? How is this different than Claim() claiming? also what
Yeah this comment needed work. Rewrote it and renamed peak_reservation_claim 
and initial_reservation_sum above to align with the names we chose elsewhere.


PS23, Line 48: that be claimed
> garbled
Done


PS23, Line 48: but not
             :   /// returned
> why not returned?
Reworded comment.


PS23, Line 50: Status
> explain failure modes
Done


http://gerrit.cloudera.org:8080/#/c/5801/23/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

PS23, Line 403: 
> in order to not regress w.r.t. IMPALA-1575, don't we need to release the in
In most cases it's probably a lot better, since it holds onto unclaimed initial 
reservations only, rather than all of the buffers ever allocated by the block 
mgr.

I think in some cases this could hold onto the memory longer though, since the 
initial reservation is released when the QueryState releases its resources, 
rather than when the last RuntimeState releases its resources. The difference 
is that the coordinator holds a reference to the QueryState.

It's a little tricky to fix. We could refcount the number of fragment instances 
and then release the initial reservation when they all finish. Thinking ahead 
though, that doesn't really work if the coordinator needs reservation for 
things like runtime filters, but I guess we could also hold a refcount for that 
until it was claimed.

What do you think about something like that - maybe adding a 
initial_reservation_refcount_ field to QueryState and releasing the initial 
reservation when it goes to zero.


http://gerrit.cloudera.org:8080/#/c/5801/23/fe/src/main/java/org/apache/impala/util/MathUtil.java
File fe/src/main/java/org/apache/impala/util/MathUtil.java:

Line 45: }
> look familar
Yeah I moved this into the resource profile patch. After the rebase onto the 
patch this is no longer needed.


-- 
To view, visit http://gerrit.cloudera.org:8080/5801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e
Gerrit-PatchSet: 23
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to