Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder

Patch Set 18:


File be/src/exec/partitioned-hash-join-builder.cc:

PS18, Line 326: we'd like all partitions to
> ... all partitions ..

PS18, Line 327: be
> are 

Line 353:   // building hash tables because allocating probe buffers may cause 
some more partitions
> duplicated sentences

PS18, Line 797: filters_.size() > 0)
> this doesn't seem to exactly match the logic at line 166 (that doesn't care
I simplified the code around line 166 by calculating 
bool build_filters; outside of the branches. Should be equivalent and easier to 
reason about.

File be/src/exec/partitioned-hash-join-builder.h:

Line 276:   /// Returns non-ok status if we couldn't spill a partition.
> nit: weird line breaking.

PS18, Line 280: should have all of their build rows
> what is this trying to say? that we're done partitioning the build?
Yeah, clarified in terms of FlushFinal().

PS18, Line 286: no buffers in either the build or probe stream.
> clarify whether this means neither has buffers or at least one does not hav

PS18, Line 287: in_mem
> are these states actually represented in code? (if not, why the underscore?

PS18, Line 291: probe_buffers_for_spilled_partitions_
> what's that?
Ah, stale comment from earlier version of the code. Updated it.

PS18, Line 301: probe_buffers_for_spilled_partitions_
> same
Ah, this method doesn't even exist and more. Removed up updated the 
CloseAndDeletePartitions() comment (since it does that work now).;

PS18, Line 349: PartitionedHashJoinNode
> who owns it?
Clarified. Also added a TODO to indicate this will need to be updated for the 
spilling broadcast join case when we come to design a solution for that.

PS18, Line 427: so that the
              :   /// initial probe buffer is allocated
> this doesn't really explain why the builder has to do it.  Would be good to

Line 432:   /// phase of the algorithm without spilling more partitions.
> this hints at it, so maybe move this up and reword to be more explicit.

File be/src/exec/partitioned-hash-join-node-ir.cc:

PS18, Line 285: *status
> if we can now do this, can we also just make NextProbeRow() and others retu
IIRC I started to do that refactoring at one point but quickly realised that it 
didn't work.

NextProbeRow() returns false both when status is an error and when it does not 
have a next row to move to.

Line 293:       if (LIKELY(hash_tbl != NULL)) {
> is probe_partition NULL in this case?
Yes it is. I added additional DCHECKs to PrepareForProbe() to validate the 
consistency of the partition state. I think adding additional DCHECKs here 
would obscure the logic somewhat. I thought it was more helpful below since the 
logic relied more directly on the correlation between the different partitions.

File be/src/exec/partitioned-hash-join-node.cc:

Line 47:     "the memory limit may help this query to complete successfully.";
> weird line breakages

File be/src/exec/partitioned-hash-join-node.h:

Line 404:   /// builder_->hash_partitions. This is non-empty only in the 
> underscore, right?
I guess the accessor is hash_partitions() so changed it to that.

PS18, Line 406: a
> and?

To view, visit http://gerrit.cloudera.org:8080/3873
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to