Michael Ho has posted comments on this change.

Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder

Patch Set 19:


File be/src/exec/blocking-join-node.cc:

PS14, Line 297: SCOPED_TIMER(build_timer_);
> It's not clear to me that the original timers were well thought out. It did
I am mostly concerned about preserving "BuildPartitionTime" and "BuildTime" 
which exist in the profile now. The former seems to measure the total time to 
partition all input rows into their partitions while the latter measures the 
total time to insert the non-spilled partitions into their corresponding hash 

File be/src/exec/partitioned-hash-join-builder.cc:

PS19, Line 377: is

File be/src/exec/partitioned-hash-join-node.cc:

PS14, Line 87: 
> They are redundant currently, but the idea is that PhjBuilder and PhjNode w
OK. To be able to probe the same hash table in parallel, it's necessary to have 
the thread-private build_expr_ctx_.

File be/src/exec/partitioned-hash-join-node.cc:

Line 77:   builder_.reset(
It helps to add a comment like:

TODO: For now, the build_side_expr_ may be duplicated for now but the builder_ 
need to have the build_side_expr_ once the builder is separated out into a 
different fragment.

PS19, Line 479:     if (!output_build_partitions_.empty()) {
              :       DCHECK(NeedToProcessUnmatchedBuildRows());
              :       // Flush the remaining unmatched build rows of any 
partitions we are done
              :       // processing before moving onto the next partition.
              :       OutputUnmatchedBuild(out_batch);
              :       if (!output_build_partitions_.empty()) break;
It seems unfortunate that we cannot remove this given the change at line 588. I 
suppose this is needed if multiple output batches are needed full to hold all 
the unmatched build rows.

The duplicated code block at line 487 almost looks like we should use a goto 

Do you think there is a good way to refactor the code a bit ?

Line 554:     if (state_ == PARTITIONING_PROBE) {
DCHECK(input_partition_ == NULL);

PS19, Line 599:     if (builder_->null_aware_partition() == NULL) {
              :       DCHECK(null_aware_probe_partition_ == NULL);
Not your change but I find this duplicated check with line 501 a bit confusing 
here. Essentially, this is a slightly different check as it covers cases for 
all joins while the one at line 501 seems to cover the case in which the null 
aware partition has been processed completely if I understand it correctly.

I wonder if you can move this check into the if statement at line 599 and make 
it if (build_->null_aware_partition() != NULL) continue;

And move *eos = true; to the common path shared by all join types.

PS19, Line 724: false 
I suppose it's good to avoid small buffers on all probe partitions for 
consistency but does this increase the memory usage of NAAJ ?

Line 740:   if (probe_rows != NULL) {
Is there an error path in which probe_rows == NULL ? Looks like line 725 
implies probe_rows != NULL.

PS19, Line 751: false
Same question as above.

PS19, Line 968: new
nit: newly created

File be/src/util/runtime-profile.h:

PS19, Line 129: AddChildFirst
Feel free to ignore but PrependChild() seems more appropriate.

To view, visit http://gerrit.cloudera.org:8080/3873
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to