Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder

Patch Set 14:


File be/src/exec/partitioned-hash-join-builder.h:

PS14, Line 45: build row partitions
> is this the same as the "hash partitions" referred to in the next paragraph
really the streams, reworded.

Line 86: 
> is it expected that the renaming public functions would only be used by PHJ

PS14, Line 90: Acquire ownership
> Doesn't the ownership pass from this class to the exec node?  In which case
It passes from the class to the caller. I guess I don't find this ambiguous 
given the return type and can't think of a non-awkward alternative.

PS14, Line 95: Called after probing of the partitions
             :   /// is done. The partitions are not closed or destroyed, since 
they may be spilled or
             :   /// may contain unmatched build rows for certain join modes 
(e.g. right outer join).
> The fact that we need some much explanation kinda makes this seem like it m
The hash join node uses this to determine whether it's already cleaned up the 
hash partitions, and to keep the build/probe hash partition vectors in sync.

Agree this isn't the interface we ultimately want but I wanted to avoid 
additional restructuring of GetNext()/CleanupHashPartitions().

Line 106:   }
> nit: missing blank line

PS14, Line 109: When this call returns
> Is this stuff by this method or the caller?  i.e. if this method closes inp

PS14, Line 112: so that the probe phase has enough buffers preallocated
              :   /// execute successfully.
> slightly garbled. seems like a word is missing or something.

PS14, Line 117: /// Iterates over all the partitions in 'hash_partitions_' and 
returns the largest
              :   /// number of build rows.
> rather than say how and referring to private member, and leaving it to the 

Line 123:   bool HashTableStoresNulls() const;
> Both classes already have their own hash table context.
Also, I think this makes sense to have here since the builder manages the hash 

PS14, Line 182: partition side of this partition
> not sure what this is saying. maybe just "for this build partition"?

Line 188:     /// false.
> maybe clarify that an error is not returned in that case.  or clarify what 

PS14, Line 195: unpin_all_build
> why "build"?  shouldn't it be unpin_all_blocks?  or even better would be to
Overall these bool arguments are really confusing, and the correctness of the 
spilling depends a lot on using the right value in the right place. I ended up 
changing this significantly to use an enum all the way down to the underlying 
BufferedTupleStream method, and adding this to SpillPartition(). It turns out 
there was a bug in BuildHashTablesAndPrepareProbeStreams() where it should have 
been unpinning all of the blocks in the partition it chose to spill but wasn't. 

I'm rerunning the memory experiment just to confirm this doesn't change 

PS14, Line 234: build or probe
> does this care about the probe side partitioning?
Yes, since we share the reservation and hand off the probe buffers.

PS14, Line 244: all hash partitions for partitioning level
> this is kinda misleading. maybe qualify with ... for the current input, or 

PS14, Line 252: in
> into

PS14, Line 256: in memory
> what does this mean? is it trying to say it fits in the current (pinned) bl
Reworded it - I think it makes more sense to think of it as whether appending 
to the stream succeeds (which may advance the block) or whether we have to 
start spilling things.

To view, visit http://gerrit.cloudera.org:8080/3873
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to