Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15096 )
Change subject: WIP: IMPALA-9156: share broadcast join builds ...................................................................... Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/15096/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15096/6//COMMIT_MSG@18 PS6, Line 18: MarkNeedsDeepCopy nit: it could be mentioned in IMPALA-4179 that MarkNeedsDeepCopy() was added again http://gerrit.cloudera.org:8080/#/c/15096/6/be/src/exec/partitioned-hash-join-builder.h File be/src/exec/partitioned-hash-join-builder.h: http://gerrit.cloudera.org:8080/#/c/15096/6/be/src/exec/partitioned-hash-join-builder.h@183 PS6, Line 183: /// At each state transition where the builder state needs to be mutated, all probe : /// threads must arrive at the barrier before proceeding. Should this also occur if one of the probes has 0 rows, isn't there an optimization for the case when the probe is not interested in a build partition at all? http://gerrit.cloudera.org:8080/#/c/15096/6/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: http://gerrit.cloudera.org:8080/#/c/15096/6/be/src/exec/partitioned-hash-join-node.cc@578 PS6, Line 578: out_batch->MarkNeedsDeepCopy(); It is not self evident at the first glance that this causes out_batch->AtCapacity() to be true. Can you add a comment or DCHECK to highlight this? http://gerrit.cloudera.org:8080/#/c/15096/6/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/15096/6/be/src/runtime/query-state.h@243 PS6, Line 243: to nit: "to" not needed http://gerrit.cloudera.org:8080/#/c/15096/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/15096/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@453 PS6, Line 453: int leftChildNodes = leftChildFragment.getNumNodes(); : if (rhsTree.getCardinality() != -1) { : rhsDataSize = Math.round( : rhsTree.getCardinality() * ExchangeNode.getAvgSerializedRowSize(rhsTree)); : if (leftChildNodes != -1) { : // RHS data must be broadcast once to each node. : broadcastCost = 2 * rhsDataSize * leftChildNodes; : } : } What happens with null aware anti joins? Is the data broadcast "num instances per node" time to every node? http://gerrit.cloudera.org:8080/#/c/15096/6/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java File fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java: http://gerrit.cloudera.org:8080/#/c/15096/6/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java@367 PS6, Line 367: ExecutorMembershipSnapshot.getCluster().numExecutors() nit: the result will be the same, but using numNodes_ here seems more logical to me. http://gerrit.cloudera.org:8080/#/c/15096/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/15096/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1298 PS6, Line 1298: totalNodes = Math.min(numLocalNodes + numRemoteNodes, cluster.numExecutors()); : // Exit early if we have maxed out our estimate of hosts/instances, to avoid : // extraneous work in case the number of scan ranges dominates the number of : // nodes. : if (totalNodes == maxPossibleInstances) break; These two lines do not seem to match: if getMaxInstancesPerNode() > 1, then totalNodes will never reach maxPossibleInstances -- To view, visit http://gerrit.cloudera.org:8080/15096 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7 Gerrit-Change-Number: 15096 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Mon, 24 Feb 2020 16:58:31 +0000 Gerrit-HasComments: Yes
