Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
......................................................................


Patch Set 1:

(6 comments)

Did a pass over this, focusing on the backend implementation.

http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc
File be/src/exec/scalar-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@31
PS1, Line 31: NULL
Prefer nullptr in new code.


http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@32
PS1, Line 32:       child_row_idx_(0),
We've mostly been switching to initialising member variables to constants 
inline in the class definition.


http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@56
PS1, Line 56:   if (child_row_batch_->num_rows() == 0) {
> Is it possible/ok for this condition to be false? See my comment in scalar-
I think this needs to be a while loop that resets the batch after each 
iteration. It's valid for GetNext() to return a row batch with 0 rows even 
before eos. This is sometimes an implementation artefact and sometimes to flush 
memory resources attached to the batch.

So if GetNext() returns 0 rows and would have returned 1 rows on the next call, 
I think this would incorrectly fail. Or if GetNext() returned 1 row && !eos, 
then 0 rows && eos.

I think the two valid termination conditions for this node are if it hits eos 
or if it sees > 1 row.


http://gerrit.cloudera.org:8080/#/c/9005/1/be/src/exec/scalar-check-node.cc@78
PS1, Line 78:   child_row_batch_->TransferResourceOwnership(row_batch);
> Just an idea, not sure if a good one: it may be better to deep copy the row
I think it's the start of a good idea. I originally suggested modelling this 
after SelectNode(), which is a streaming node that passes through its input 
rows.

However, if we deep copy the output row, we could implement this as a blocking 
node that consumes its input in Open(), copies the single surviving row, then 
closes the input tree.

Blocking nodes can reduce resource requirements for a plan because the planner 
knows that operators above and below the blocking node in the same fragment 
don't execute concurrently. There's a virtual function in the frontend ExecNode 
implementation that determines if it's blocking or not.


http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/9005/1/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@735
PS1, Line 735:    * @throws ImpalaException
We don't generally have @throws declarations in Java comments


http://gerrit.cloudera.org:8080/#/c/9005/1/testdata/workloads/functional-query/queries/QueryTest/subquery.test
File testdata/workloads/functional-query/queries/QueryTest/subquery.test:

http://gerrit.cloudera.org:8080/#/c/9005/1/testdata/workloads/functional-query/queries/QueryTest/subquery.test@839
PS1, Line 839: ---- QUERY
Maybe add a test that sets num_nodes=1 to make sure that the single-node 
versino of the plans are executable.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Attila Jeges <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Thu, 11 Jan 2018 23:32:50 +0000
Gerrit-HasComments: Yes

Reply via email to