Tim Armstrong has posted comments on this change. Change subject: IMPALA-4674: Part 2: port backend exec to BufferPool ......................................................................
Patch Set 23: (27 comments) http://gerrit.cloudera.org:8080/#/c/5801/23/be/src/exec/analytic-eval-node.cc File be/src/exec/analytic-eval-node.cc: PS23, Line 194: !buffer_pool_client_.is_registered() > I guess that's needed for subplans? Yeah exactly. It's a little ugly but I couldn't think of a cleaner way to deal with this. http://gerrit.cloudera.org:8080/#/c/5801/23/be/src/exec/analytic-eval-node.h File be/src/exec/analytic-eval-node.h: PS23, Line 336: reservation > the stream will try to grow the reservation first, right? Done http://gerrit.cloudera.org:8080/#/c/5801/23/be/src/exec/exec-node.cc File be/src/exec/exec-node.cc: Line 209: &buffer_pool_client_, resource_profile_.min_reservation); > do we somewhere inside there dcheck that the exec-node has at least min_res No. I'm not sure that it's a useful check either - if the ExecNode implementation wants to give up reservation earlier that seems OK. Line 226: resource_profile_.spillable_buffer_size < buffer_pool->min_buffer_len()) { > when can that happen? It's possible if the coordinator has a different configuration from the backends. It seems unlikely that anyone would configure it in that way but who knows. Line 233: // to get some reservation. > I'm not sure what that comment is saying. Yeah it was garbled. I don't think it's needed. PS23, Line 240: if (resource_profile_.min_reservation > 0) { > maybe delete that to exercise this path unconditionally Done http://gerrit.cloudera.org:8080/#/c/5801/23/be/src/exec/hash-table.h File be/src/exec/hash-table.h: PS23, Line 948: > is that solved or was it not a problem? This was IMPALA-3899, which is fixed. This comment just didn't get updated. PS23, Line 532: Either a pointer to the single tuple of this row > where does the tuple live in that case? Reworded the comment to be clearer (hopefully). PS23, Line 605: success > that's kind of confusing given the OK on status means success also. Maybe m Yeah got_memory is clearer. PS23, Line 617: > blank Done PS23, Line 625: Status* status > do we need to pass that indirectly for codegen/perf still? i thought we mad I tried to do this a while back with BTS::AddRow() but it was a can of worms. The problem was mainly codegen time, IIRC the runtime difference wasn't significant. In a lot of cases the compiler couldn't infer that the destructor for the automatic Status variable was dead code so the IR ends up with lots of unnecessary branches from the Status destructors. PS23, Line 688: success > same comment Done PS23, Line 854: Status* status > same question about status perf. otherwise, seems better to pass HtData** See above comment. PS23, Line 862: success > same comment changed to got_memory PS23, Line 879: status > same comment vs DuplicateNode** See above comment. PS23, Line 895: tatus* status) > this is opposite all the "bool* success" interfaces. reason for that? Same as the other ones - this gets inlined into the IR and the automatic Status causes problems. PS23, Line 923: stores_tuples_ > this could probably use a comment (the comment above seems to apply to the Yeah Tuple* is significantly faster at least until GetTupleRow() is codegen'd. Added a comment. http://gerrit.cloudera.org:8080/#/c/5801/23/be/src/exec/partitioned-aggregation-node-ir.cc File be/src/exec/partitioned-aggregation-node-ir.cc: Line 224: if (hash_tbl == nullptr) return false; // Hash table was not created - pass through. > why does that happen now? or is it a bug fix? This was required so that preaggs can execute with zero reservation. Before we "got away" with allocating small hash tables and streams because the hash table and small buffer memory wasn't counted against the reservations. http://gerrit.cloudera.org:8080/#/c/5801/23/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: PS23, Line 88: page > should we express that as min_buffer_size instead? maybe easy to understan Yeah that would be more consistent. There was already a min_buffer_size flag in disk-io-mgr so I combined the two - they'll be merged anyway when we switch the io-mgr over to the buffer pool. http://gerrit.cloudera.org:8080/#/c/5801/23/be/src/runtime/initial-reservations.cc File be/src/runtime/initial-reservations.cc: Line 69: DCHECK(success); > Explain why success is expected (planner computation). Done http://gerrit.cloudera.org:8080/#/c/5801/23/be/src/runtime/initial-reservations.h File be/src/runtime/initial-reservations.h: PS23, Line 40: This creates trackers for initial reservations under those. > This seems similar to a SubReservation, no? Right, they're both wrappers around ReservationTrackers but a SubReservation is associated with a BufferPool Client, and the operations are a bit different. PS23, Line 47: Claim > claim it from where? How is this different than Claim() claiming? also what Yeah this comment needed work. Rewrote it and renamed peak_reservation_claim and initial_reservation_sum above to align with the names we chose elsewhere. PS23, Line 48: that be claimed > garbled Done PS23, Line 48: but not : /// returned > why not returned? Reworded comment. PS23, Line 50: Status > explain failure modes Done http://gerrit.cloudera.org:8080/#/c/5801/23/be/src/runtime/runtime-state.h File be/src/runtime/runtime-state.h: PS23, Line 403: > in order to not regress w.r.t. IMPALA-1575, don't we need to release the in In most cases it's probably a lot better, since it holds onto unclaimed initial reservations only, rather than all of the buffers ever allocated by the block mgr. I think in some cases this could hold onto the memory longer though, since the initial reservation is released when the QueryState releases its resources, rather than when the last RuntimeState releases its resources. The difference is that the coordinator holds a reference to the QueryState. It's a little tricky to fix. We could refcount the number of fragment instances and then release the initial reservation when they all finish. Thinking ahead though, that doesn't really work if the coordinator needs reservation for things like runtime filters, but I guess we could also hold a refcount for that until it was claimed. What do you think about something like that - maybe adding a initial_reservation_refcount_ field to QueryState and releasing the initial reservation when it goes to zero. http://gerrit.cloudera.org:8080/#/c/5801/23/fe/src/main/java/org/apache/impala/util/MathUtil.java File fe/src/main/java/org/apache/impala/util/MathUtil.java: Line 45: } > look familar Yeah I moved this into the resource profile patch. After the rebase onto the patch this is no longer needed. -- To view, visit http://gerrit.cloudera.org:8080/5801 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e Gerrit-PatchSet: 23 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
