Michael Ho has posted comments on this change.

Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder

Patch Set 14:


Some comments and questions.

File be/src/exec/blocking-join-node.cc:

PS14, Line 297: SCOPED_TIMER(build_timer_);
We used to have two different timers in PHJ for measuring the time to hash the 
input rows into different partitions and the time to build the hash tables. It 
would be good to be able to retain that fined grained tracking PHJBuilder.

File be/src/exec/partitioned-hash-join-builder.cc:

Line 179:     largest_fraction = max(largest_fraction,
DCHECK_GT(num_build_rows, 0);

PS14, Line 318: a hash table
hash tables

Line 341:   // building hash tables to avoid building hash tables for 
partitions we'll spill anyway.
That's an interesting comment. If I understand it correctly, it means that 
calling InitSpilledPartitionProbeStreams() may cause more partitions to be 
spilled as it needs to pin write buffers for the spilled streams so we should 
make sure we allocate all the write streams to trigger all the extra spills 
which may happen before building hash tables. If so, would you mind elaborating 
a bit in the comment:

"Allocate probe buffers for all partitions that are already spilled. Do this 
before building hash tables as allocating probe buffers may cause some more 
partitions to be spilled. This avoids wasted work on building hash tables for 
partitions which end up being spilled eventually."

Do you know how often this case happens ? I suppose this complication will be 
gone eventually once reservation is in place.

Line 401:       RETURN_IF_ERROR(SpillPartition());
May help to document that failure to find any partition to spill (e.g. all 
partitions are spilled) will return an error code. It looks as if we may be in 
an infinite loop on the first glance.

PS14, Line 529: PhjBuilder::HashTableStoresNulls()
This seems to belong to PartitionedHashJoinNode conceptually.

Line 646:   do {
Please see comments in BlockingJoinNode,  it would be great to retain timers 
for InsertBatch() and ProcessBuildInput() separately for finer grain tracking.

PS14, Line 782: process_build_batch_fn_level0

File be/src/exec/partitioned-hash-join-builder.h:

Line 425: 
nit: unnecessary blank line ?

File be/src/exec/partitioned-hash-join-node.cc:

PS14, Line 151: 
This may be important information to retain.

PS14, Line 87: Expr::CreateExprTree(pool_, eq_join_conjunct.right
Does this overlap with CreateExprTree() in PhjBuilder::Init() ? Same question 
for build_expr_ctxs_. If they are not redundant, mind commenting on how they 
are used differently ?

Line 213:   for (unique_ptr<ProbePartition>& partition : spilled_partitions_)
missing { }

PS14, Line 494: to output

PS14, Line 585: hash_partitions_

Line 594:       continue;
Is there a reason why we cannot call OutputUnmatchedBuild() directly at this 
point ?

PS14, Line 996: s)
next_state ?

File be/src/exec/partitioned-hash-join-node.h:

PS12, Line 81: hash_partitions_

PS12, Line 88: probe_hash_partitions_

PS12, Line 112: input_partition_

File be/src/exec/partitioned-hash-join-node.h:

PS14, Line 437: 
Is this merged into build_timer_ in BlockingJoinNode now ? It may be helpful 
for debugging to maintain two separate timers.

PS14, Line 81: hash_partitions_

PS14, Line 84: This is the only phase 
These are the only phases

PS14, Line 88: probe_hash_partitions_

PS14, Line 112: input_partition_

PS14, Line 118: input_partition_

PS14, Line 298: spilled_partitions_

PS14, Line 302: probe_batch_pos_

PS14, Line 306: probe_batch_pos_

PS14, Line 309: input_partition_

PS14, Line 408: null_aware_

Line 445:     /// block cannot be acquired. "delete_on_read" mode is used, so 
blocks in the stream
... used for the buffered tuple stream, so..

To view, visit http://gerrit.cloudera.org:8080/3873
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to