Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
......................................................................


Patch Set 9:

(22 comments)

Thanks for the comments! I could resolve most of them, but still have some 
questions.

http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h
File be/src/exec/cardinality-check-node.h:

http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@25
PS9, Line 25: namespace impala {
> Side note: I could hot a DCHECK with this query:
This query raises the cardinality error.
I also added this query to QueryTest/subquery.test


http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@29
PS9, Line 29: /// Currently it is only used to check <= 1 cardinality of the 
child.
> I cannot think of a scenario where we would need to check for a cardinality
No, I just wanted make it more generic. Done.


http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@38
PS9, Line 38:   virtual Status Open(RuntimeState* state) override;
> Thanks for your attempt and details. The following ideas might help:
Thanks for the tips. However, I still couldn't succeed. I used the following 
query:
 select 1 from (values(1 a)) v
 where a = (select id from functional.alltypestiny)
 limit 1
Plan is:
 PLAN-ROOT SINK
 03:HASH JOIN [RIGHT SEMI JOIN]
 |--00:UNION
 |     constant-operands=1
 02:CARDINALITY CHECK
 01:SCAN HDFS [functional.alltypestiny]

But, PartitionedHashJoinNode calls GetNext() twice on its child. First, it 
calls in Open() -> BlockingJoinNode::GetFirstProbeRow(state) -> 
CardinalityCheckNode::GetNext().
Then, in PartitionedHashJoinNode::GetNext() it starts to process the rows in 
the current rowbatch (fetched by Open()). After that it calls 
NextProbeRowBatch(), which calls GetNext() on CardinalityCheckNode, and this 
time it raises the error.

So as far as I can tell the followings have to go wrong to cause the bug:
- change the behavior of PartitionedHashJoinNode
- make CardinalityCheckNode a streaming node despite of the comments that warn 
you not to do that
- batch size happens to be zero

Anyway, I added an e2e to hs2/test_fetch.py, although I couldn't make it fail.


However, if CardinalityCheckNode is in a subplan, the following query succeeds 
(if batch_size=1, with default batch size it is flaky):
 select c_custkey
 from customer c
 where c_custkey < (select o_orderkey
                    from c.c_orders
                    where o_orderkey in (4285920, 6000000, 5000000)))
 limit 1;

o_orderkey 4285920 and 6000000 belong to the same customer, while o_orderkey 
5000000 belongs to a different customer.
Without LIMIT 1 it fails. But, since the subquery refers to an identifier from 
the outer query I guess it counts as a correlated subquery, while this JIRA is 
only about uncorrelated subqueries. Anyway, it might be a good test for the 
uncorrelated subquery JIRA.


http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@58
PS9, Line 58:   // The associated SQL statement
> for error reporting
Done


http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.cc
File be/src/exec/cardinality-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.cc@37
PS9, Line 37:   DCHECK_EQ(limit_, max_cardinality_);
> Add comment explaining why we are not calling ExecNode::Prepare()
we are calling it in L40.


http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java:

http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@189
PS9, Line 189:             subquery.getStatement().setIsRuntimeScalar(true);
> I don't think this approach is general enough to handle cases like the foll
Extended StmtRewriter with a new static method called 
rewriteNonScalarSubqueries(). It takes an expr, and tries to rewrite the 
contained subqueries' type to scalar.

Now these queries fail at runtime. They succeed if we add a WHERE clause with 
'id = 1'.

I added tests with these kind of queries.

Do you have any other corner case in mind?


http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java:

http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@94
PS9, Line 94:   public boolean isRuntimeScalar() { return isRuntimeScalar_; }
> please move these getters and setters to L365 to not clutter the members li
Done


http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@99
PS9, Line 99:   protected String origStmtStr_ = null;
> * origSqlString_
Renamed.
Added comment about it.


http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@101
PS9, Line 101:   public void saveStatementStringIfOriginal() {
> The correctness of this is pretty subtle and I think it would be easy to br
Regarding to views, I added the following query to QueryTest/subquery.test:
 SELECT id
 FROM functional.alltypes
 WHERE id = (SELECT bigint_col FROM functional.alltypes_view)
The test checks that the error message doesn't reveal the definition of 
alltypes_view.

I extended ToSqlTest as well to test getOrigStmtStr() with some queries 
borrowed from PlannerTest/subquery-rewrite.test.


http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
File fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java:

http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java@27
PS9, Line 27:  * Node that checks the cardinality of its child, i.e. it aborts 
the query if the child
> Simplify:
Simplified statement.
Now it is specialized to cardinality 1.


http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@739
PS9, Line 739:     Preconditions.checkState(
> Check is confusing because the CardinalityCheckNode is only allowed to have
Done


http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@746
PS9, Line 746:     // set the child explicitly, an ExchangeNode have been 
inserted
> grammar 'have'
added 'might' because in some cases we don't insert an ExchangeNode


http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1116
PS9, Line 1116:     if (inlineViewRef.getViewStmt().isRuntimeScalar()) {
> Logically this belongs above L1112 because the SELECT node contains predica
I moved it up.

Actually the following query still succeeds, but I think it shouldn't:
 select 1
 from functional.alltypessmall
 where id = 3 and id = (select id from functional.alltypes where id = 3 or id = 
3962);
The outer 'id = 3' is moved to the subquery, and it makes the subquery return 
only one row.
It is a weird case, because I think we want these kind of optimizations, but 
they prevent correct cardinality checking.


http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java:

http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1164
PS9, Line 1164:     // IMPALA-6314: Scalar subquery check is done at runtime, 
not at parse-time
> remove JIRA prefix since this is a feature
Done


http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1165
PS9, Line 1165:     AnalyzesOk("select id from functional.alltypestiny t where 
int_col = " +
> use a few different comparison ops (not just "=")
Done


http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1168
PS9, Line 1168:         "1 - (select int_col from functional.alltypessmall 
where id = 1)");
> * remove "where id = 1" or add other tests that show the query will obvious
Done


http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1170
PS9, Line 1170:         "(select int_col from functional.alltypessmall where id 
= 1) * 1");
> use something other than * 1 because that might be optimized away in the fu
Done


http://gerrit.cloudera.org:8080/#/c/9005/9/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
File testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test:

http://gerrit.cloudera.org:8080/#/c/9005/9/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test@1328
PS9, Line 1328: # Subquery in binary predicate that does cardinality check at 
parse-time
> * Please move these tests to subquery.test where we specifically test subqu
* I moved these tests to PlannerTest/subquery-rewrite.test
* Do you mean I should use different queries than the ones in 
QueryTest/subquery.test?


http://gerrit.cloudera.org:8080/#/c/9005/9/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test@1432
PS9, Line 1432: |  |  max cardinality: 1
> How about we get rid of 'max cardinality' and only use the limit there will
Done


http://gerrit.cloudera.org:8080/#/c/9005/9/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test@1476
PS9, Line 1476:   3 * (select id
> Add a few tests with something more complex in the subquery that is not jus
* Added join and grouping aggregation, union test to subquery-rewrite.test
* Added nested subqueries tests to nested-collections.test


http://gerrit.cloudera.org:8080/#/c/9005/9/testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
File 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test:

http://gerrit.cloudera.org:8080/#/c/9005/9/testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test@674
PS9, Line 674: # IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
> * Add 1-2 planner tests that show the plan for queries like this. You can a
* Added planner tests to nested-collections.test
* Removed the IMPALA-6314 annotations


http://gerrit.cloudera.org:8080/#/c/9005/9/testdata/workloads/functional-query/queries/QueryTest/subquery.test
File testdata/workloads/functional-query/queries/QueryTest/subquery.test:

http://gerrit.cloudera.org:8080/#/c/9005/9/testdata/workloads/functional-query/queries/QueryTest/subquery.test@920
PS9, Line 920: Query aborted:Subquery must return maximum 1 row(s):
> Error is odd because today the only valid limit for the cardinality check n
Done



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Attila Jeges <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 31 Jan 2018 16:17:57 +0000
Gerrit-HasComments: Yes

Reply via email to