Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4883: Union Codegen
......................................................................


Patch Set 5:

(13 comments)

Getting pretty close, just minor cleanup at this point.

I also just wanted to check with Michael that the tuple_pool_ approach made the 
most sense for now - we'll need to clean that up as part of his codegen work 
but I don't think it makes sense to fix in this patch.

http://gerrit.cloudera.org:8080/#/c/6459/4/be/src/exec/union-node-ir.cc
File be/src/exec/union-node-ir.cc:

Line 34: 
> We can avoid checking limits for each row if we check it at the end and tru
It looks like we already do this for the passthrough case so we might as well 
do it here.


http://gerrit.cloudera.org:8080/#/c/6459/5/be/src/exec/union-node-ir.cc
File be/src/exec/union-node-ir.cc:

Line 19: #include "runtime/tuple.h"
Do we need tuple.h? I don't think I see any references to Tuple* in here.


Line 21: #include "util/runtime-profile-counters.h"
Is this needed still?


Line 35:   while (!dst_batch->AtCapacity() && child_row_idx < 
child_batch->num_rows()) {
Nice! We can maybe avoid a few more loads and stores via the child_batch and 
tuple_buf pointers. I.e.

int child_batch_rows = child_batch->num_rows().
uint8_t* curr_tuple = *tuple_buf;
...
*tuple_buf = curr_tuple.


Line 46:   if (limit_ != -1 && num_rows_returned_ + dst_batch->num_rows() > 
limit_) {
We don't need to cross-compile this logic. Let's move it into the caller and 
save LLVM some work.

Although, see my comment about moving this logic to GetNext() and sharing it 
for all three codepaths.


http://gerrit.cloudera.org:8080/#/c/6459/5/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

Line 168:   if (limit_ != -1 && num_rows_returned_ + row_batch->num_rows() > 
limit_) {
How about we move this logic around num_rows_returned_ and limit_ into 
GetNext()? I believe the same logic can work for all three cases if we slightly 
extend it so that it handles the case when GetNext() is called with a non-empty 
batch, which can happen in a subplan.


http://gerrit.cloudera.org:8080/#/c/6459/1/be/src/exec/union-node.h
File be/src/exec/union-node.h:

PS1, Line 71:  in this poo
> Ok, this can be removed in the future.
Michael, do you think it makes sense to use this approach for now? I'm not that 
familiar with CodegenMaterializeExprs() so not sure if there is a better way to 
do this.


http://gerrit.cloudera.org:8080/#/c/6459/5/be/src/exec/union-node.h
File be/src/exec/union-node.h:

Line 28: #include "runtime/tuple.h"
Do we need the tuple.h and tuple-row.h imports? Oh I guess for the inline 
MaterializeExprs function, but we can move that to the -ir.cc file anyway.


Line 72:   /// each GetNext() call.
We should add a TODO to remove this. Maybe Michael knows if there's a JIRA that 
will allow us to remove it.


PS5, Line 99: Null
NULL here and below, just for consistency with other comments.


PS5, Line 128: row_batch
dst_batch.


Line 136:   void IR_ALWAYS_INLINE MaterializeExprs(const 
std::vector<ExprContext*>& exprs,
Move this to the -ir.cc file? I don't think there's a reason we need to define 
it in the .h


Line 148:   bool inline IsChildPassthrough(int child_idx) const {
I don't think any of the "inline" specifiers here and below do anything - if 
the function is defined in the class body it implicitly has an "inline" hint.

https://isocpp.org/wiki/faq/inline-functions#inline-member-fns-more


-- 
To view, visit http://gerrit.cloudera.org:8080/6459
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to