Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 )
Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries ...................................................................... Patch Set 1: (6 comments) Did a pass over this, focusing on the backend implementation. http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc File be/src/exec/scalar-check-node.cc: http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@31 PS1, Line 31: NULL Prefer nullptr in new code. http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@32 PS1, Line 32: child_row_idx_(0), We've mostly been switching to initialising member variables to constants inline in the class definition. http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@56 PS1, Line 56: if (child_row_batch_->num_rows() == 0) { > Is it possible/ok for this condition to be false? See my comment in scalar- I think this needs to be a while loop that resets the batch after each iteration. It's valid for GetNext() to return a row batch with 0 rows even before eos. This is sometimes an implementation artefact and sometimes to flush memory resources attached to the batch. So if GetNext() returns 0 rows and would have returned 1 rows on the next call, I think this would incorrectly fail. Or if GetNext() returned 1 row && !eos, then 0 rows && eos. I think the two valid termination conditions for this node are if it hits eos or if it sees > 1 row. http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@78 PS1, Line 78: child_row_batch_->TransferResourceOwnership(row_batch); > Just an idea, not sure if a good one: it may be better to deep copy the row I think it's the start of a good idea. I originally suggested modelling this after SelectNode(), which is a streaming node that passes through its input rows. However, if we deep copy the output row, we could implement this as a blocking node that consumes its input in Open(), copies the single surviving row, then closes the input tree. Blocking nodes can reduce resource requirements for a plan because the planner knows that operators above and below the blocking node in the same fragment don't execute concurrently. There's a virtual function in the frontend ExecNode implementation that determines if it's blocking or not. http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@735 PS1, Line 735: * @throws ImpalaException We don't generally have @throws declarations in Java comments http://gerrit.cloudera.org:8080/#/c/9005/1/testdata/workloads/functional-query/queries/QueryTest/subquery.test File testdata/workloads/functional-query/queries/QueryTest/subquery.test: http://gerrit.cloudera.org:8080/#/c/9005/1/testdata/workloads/functional-query/queries/QueryTest/subquery.test@839 PS1, Line 839: ---- QUERY Maybe add a test that sets num_nodes=1 to make sure that the single-node versino of the plans are executable. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 11 Jan 2018 23:32:50 +0000 Gerrit-HasComments: Yes
