Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15096 )

Change subject: WIP: IMPALA-9156: share broadcast join builds
......................................................................


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/15096/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15096/6//COMMIT_MSG@18
PS6, Line 18: MarkNeedsDeepCopy
nit: it could be mentioned in IMPALA-4179 that MarkNeedsDeepCopy() was added 
again


http://gerrit.cloudera.org:8080/#/c/15096/6/be/src/exec/partitioned-hash-join-builder.h
File be/src/exec/partitioned-hash-join-builder.h:

http://gerrit.cloudera.org:8080/#/c/15096/6/be/src/exec/partitioned-hash-join-builder.h@183
PS6, Line 183: /// At each state transition where the builder state needs to be 
mutated, all probe
             : /// threads must arrive at the barrier before proceeding.
Should this also occur if one of the probes has 0 rows, isn't there an 
optimization for the case when the probe is not interested in a build partition 
at all?


http://gerrit.cloudera.org:8080/#/c/15096/6/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

http://gerrit.cloudera.org:8080/#/c/15096/6/be/src/exec/partitioned-hash-join-node.cc@578
PS6, Line 578:             out_batch->MarkNeedsDeepCopy();
It is not self evident at the first glance that this causes 
out_batch->AtCapacity() to be true. Can you add a comment or DCHECK to 
highlight this?


http://gerrit.cloudera.org:8080/#/c/15096/6/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/15096/6/be/src/runtime/query-state.h@243
PS6, Line 243: to
nit: "to" not needed


http://gerrit.cloudera.org:8080/#/c/15096/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/15096/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@453
PS6, Line 453:     int leftChildNodes = leftChildFragment.getNumNodes();
             :     if (rhsTree.getCardinality() != -1) {
             :       rhsDataSize = Math.round(
             :           rhsTree.getCardinality() * 
ExchangeNode.getAvgSerializedRowSize(rhsTree));
             :       if (leftChildNodes != -1) {
             :         // RHS data must be broadcast once to each node.
             :         broadcastCost = 2 * rhsDataSize * leftChildNodes;
             :       }
             :     }
What happens with null aware anti joins? Is the data broadcast "num instances 
per node" time to every node?


http://gerrit.cloudera.org:8080/#/c/15096/6/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
File fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java:

http://gerrit.cloudera.org:8080/#/c/15096/6/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java@367
PS6, Line 367: ExecutorMembershipSnapshot.getCluster().numExecutors()
nit: the result will be the same, but using numNodes_ here seems more logical 
to me.


http://gerrit.cloudera.org:8080/#/c/15096/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/15096/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1298
PS6, Line 1298:           totalNodes = Math.min(numLocalNodes + numRemoteNodes, 
cluster.numExecutors());
              :           // Exit early if we have maxed out our estimate of 
hosts/instances, to avoid
              :           // extraneous work in case the number of scan ranges 
dominates the number of
              :           // nodes.
              :           if (totalNodes == maxPossibleInstances) break;
These two lines do not seem to match: if getMaxInstancesPerNode() > 1, then 
totalNodes will never reach maxPossibleInstances



--
To view, visit http://gerrit.cloudera.org:8080/15096
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7
Gerrit-Change-Number: 15096
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Comment-Date: Mon, 24 Feb 2020 16:58:31 +0000
Gerrit-HasComments: Yes

Reply via email to