Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
......................................................................


Patch Set 11:

(24 comments)

http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.h
File be/src/exec/cardinality-check-node.h:

http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.h@27
PS11, Line 27: /// Node that passes along the row pulled from its child 
unchanged.
> First sentence should crisply state purpose of this node, e.g.:
Thanks, I am using your sentences :)


http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.h@29
PS11, Line 29: /// Note that this node must be a blocking node!
> Please give an explanation why, e.g.:
Thanks again, I don't think I could formulate it so nicely! :)


http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h
File be/src/exec/cardinality-check-node.h:

http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@29
PS9, Line 29: /// Note that this node must be a blocking node!
> Thanks.
To my defense in PS1 it was called ScalarCheckNode and it only allowed one row 
:)
Later when I renamed it I felt that some extra flexibility suits the new name.
But I see that it is unlikely that we'll ever need this flexibility.


http://gerrit.cloudera.org:8080/#/c/9005/9/be/src/exec/cardinality-check-node.h@38
PS9, Line 38:   virtual Status Reset(RuntimeState* state) override;
> Nice experiments and tests! Seems really hard to hit this case with a real
Thanks! Done.


http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.cc
File be/src/exec/cardinality-check-node.cc:

http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.cc@68
PS11, Line 68:   DCHECK_LE(rows_collected, 1);
> DCHECK that rows_collected is either 0 or 1. The current check accepts nega
Done


http://gerrit.cloudera.org:8080/#/c/9005/11/be/src/exec/cardinality-check-node.cc@85
PS11, Line 85:     row_batch_->DeepCopyTo(output_row_batch);
> Why do we deep copy the row twice? Once in Open() should be sufficient.
Done


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java
File fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/ArithmeticExpr.java@175
PS11, Line 175:       StmtRewriter.rewriteNonScalarSubqueries(operand, 
analyzer);
> We should still keep the analysis and rewrite phases distinctly separate as
I'm a bit confused about how to organise this.
In AnalysisContext.analyze() I see the following phases:
* analyze
* rewrite exprs
* rewrite stmts
* re-analyze if needed

If I don't do anything here, this code will throw an exception that needs to be 
handled somewhere, but the real problem is that the first analyze phase will 
remain unfinished. Then, we would start applying rewrites on the partially 
analyzed tree.

So, some modifications are needed to analyzeImpl() I think. I tried to simply 
return from it when we encounter a non-scalar subquery. Then, do the rewrite in 
StmtRewriter, but analyze() is also called during statement rewriting. This is 
problematic because the AnalysisExceptions are wrapped in an 
IllegalStateException and that causes some tests to fail. This happens when the 
subquery is still invalid after making it scalar, e.g.:

 select id
 from functional.alltypestiny
 where int_col = (select max(timestamp_col) from
                  functional.alltypessmall)

 ERROR: AnalysisException: null
 CAUSED BY: IllegalStateException: Failed analysis after expr substitution.
 CAUSED BY: AnalysisException: operands of type INT and TIMESTAMP are not 
comparable: int_col = (SELECT max(timestamp_col) FROM functional.alltypessmall)


My subquery "rewrite" only changes the type of the subquery. Aren't these kind 
of type deductions belong to the analysis phase? Maybe I should move 
rewriteNonScalarSubqueries() to Expr and rename it to something like 
makeContainedSubquerySalar() ?

Am I missing something? Do you have something in mind how should I deal with it?


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@177
PS11, Line 177:       for (Expr expr: children_) {
> Move to the rewrite phase.
see comment in ArithmeticExpr.java


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@529
PS11, Line 529:       for (Expr child : getChildren()) {
> Move to rewrite phase.
see comment in ArithmeticExpr.java


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@94
PS11, Line 94:   // This member contains the original statement provided by the 
user.
> Not accurate. This contains the post-analysis toSql() before rewrites. It m
Updated the comment.


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@369
PS11, Line 369:   public String getOrigStmtStr() { return origSqlString_; }
> getOrigSqlString()
Done


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@371
PS11, Line 371:     if (origSqlString_ == null) origSqlString_ = toSql();
> remove function and inline this into the single call site
Done


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@245
PS11, Line 245:         StmtRewriter.rewriteNonScalarSubqueries(whereClause_, 
analyzer);
> move to rewrite phase
see comment in ArithmeticExpr.java


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@99
PS11, Line 99:   public static void rewriteNonScalarSubqueries(Expr expr, 
Analyzer analyzer)
> Should not be public and called from the outside during analysis. The subqu
see comment in ArithmeticExpr.java. I'll deal with it in the next patch set.


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@104
PS11, Line 104:     for (Subquery subquery : subqueries) {
> Why do we need to go through multiple subqueries? My understanding is that
The presence of multiple subqueries are checked at elsewhere and proper error 
messages are generated. I didn't want to deal with it at this level.

I'll rewrite/remove this method after conclusion is reached on the comment in 
ArithmeticExpr.java


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java
File fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java@27
PS11, Line 27:  * Node that passes along the row pulled from its child 
unchanged.
> Same comment as in the BE exec node
Done


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java@34
PS11, Line 34:   protected CardinalityCheckNode(PlanNodeId id, PlanNode child, 
String displayStatement) {
> nit: we typically abbreviate Statement with Stmt
Done


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/main/java/org/apache/impala/planner/CardinalityCheckNode.java@39
PS11, Line 39:     // Also, an optimized plan is generated with LIMIT.
> remove since it doesn't really add much info
removed the comment


http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/9005/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1116
PS9, Line 1116:     // TODO: This check is also repeated in 
migrateConjunctsToInlineView() because we
> Nice query. Returning incorrect results is definitely not acceptable.
The propagation of the conjunct 'id = 3' starts in InlineViewRef.analyze() when 
analyzer.createAuxEqPredicate() is called.

To me this optimisation seems quite intentional, and as far as I can tell it is 
only a problem when runtime cardinality check is needed.

So my quick fix is to not call analyzer.createAuxEqPredicate() when runtime 
cardinality check is needed.

What do you think?


http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java:

http://gerrit.cloudera.org:8080/#/c/9005/11/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java@1434
PS11, Line 1434:   public void testRewrittenStatements() {
> Instead of adding this new test, perhaps you can address the TODO in subque
Thanks for pointing to this test, I addressed the TODO and now the 
subqueryTest() fails because of the De Morgan's laws are applied to one of its 
query:

 select * from functional.alltypes where not (id < 10 and (int_col in (select 
int_col from functional.alltypestiny)) and (string_col = (select 
max(string_col) from functional.alltypestiny)))


 Subqueries in OR predicates are not supported: id >= 10 OR int_col NOT IN 
(SELECT int_col FROM functional.alltypestiny) OR string_col != (SELECT 
max(string_col) FROM functional.alltypestiny)

I temporarily commented it out and added a TODO because I'm not sure if I 
should file a Jira, or, we just don't allow these kind of queries anymore.


http://gerrit.cloudera.org:8080/#/c/9005/9/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
File testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test:

http://gerrit.cloudera.org:8080/#/c/9005/9/testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test@1328
PS9, Line 1328:
> * Thanks
Got it, and removed two tests.


http://gerrit.cloudera.org:8080/#/c/9005/11/testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
File 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test:

http://gerrit.cloudera.org:8080/#/c/9005/11/testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test@709
PS11, Line 709: Subquery must return maximum 1 row(s):
> fix
Done


http://gerrit.cloudera.org:8080/#/c/9005/11/testdata/workloads/functional-query/queries/QueryTest/subquery.test
File testdata/workloads/functional-query/queries/QueryTest/subquery.test:

http://gerrit.cloudera.org:8080/#/c/9005/11/testdata/workloads/functional-query/queries/QueryTest/subquery.test@994
PS11, Line 994: # Subquery that returns more than one row
> comment what this test is for
Done


http://gerrit.cloudera.org:8080/#/c/9005/11/tests/hs2/test_fetch.py
File tests/hs2/test_fetch.py:

http://gerrit.cloudera.org:8080/#/c/9005/11/tests/hs2/test_fetch.py@204
PS11, Line 204:   @needs_session(conf_overlay={"num_nodes": "1", "batch_size": 
"1"})
> Thanks for experimenting and adding this test. I'm not sure there's much va
Done



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Wed, 14 Feb 2018 16:36:24 +0000
Gerrit-HasComments: Yes

Reply via email to