Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
......................................................................


Patch Set 23:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java:

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@425
PS23, Line 425:       analysisResult_.stmt_.origSqlString_ = 
analysisResult_.stmt_.toSql();
> This is pretty weird. See my comment in StatementBase.
Removed it from here.
Now I'm setting origSqlString_ in SelectStmt.analyze() and UnionStmt.analyze().


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/StatementBase.java
File fe/src/main/java/org/apache/impala/analysis/StatementBase.java:

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@43
PS23, Line 43:   protected String origSqlString_ = null;
> Why is this here and not in QueryStmt? I think it's cleaner to have it in Q
OK, moved it to QueryStmt.

So when I set it in AnalysisContext.analyze() after the first analysis phase 
(before any rewrites), wasn't it correctly set for all the statements?


http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1025
PS21, Line 1025:     if (isCorrelated && 
!item.getExpr().contains(Expr.IS_BUILTIN_AGG_FN)) {
> I looked into this and concluded that we should convert the type check to a
Turned it into a precondition.

To fix the error message I changed the condition in L1032 and introduced a new 
predicate called IS_UDA_FN.


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@747
PS23, Line 747:     cardinalityCheckNode.computeStats(ctx_.getRootAnalyzer());
> Why do we need to computeStats() again?
I introduced it in PS2 as some kind of workaround, but it seems we no longer 
need it.
Removed this line.


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java:

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1163
PS23, Line 1163:     // Scalar subquery check is done at runtime, not at 
parse-time
> not at parse-time -> not during analysis
Done


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1164
PS23, Line 1164:     AnalyzesOk("select id from functional.alltypestiny t where 
int_col = " +
> Loop over cmpOperators and have a few different variants inside the loop
Created a loop.


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1174
PS23, Line 1174:     AnalyzesOk("select id from functional.alltypestiny t where 
int_col < " +
> Do we have a negative test that shows disallowing correlated scalar subquer
No, we don't have such tests, and currently the following query is allowed:

select * from functional.alltypes t1 where id < (select id from 
functional.alltypes t2 where t1.int_col = t2.int_col);

However, it raises the cardinality error, but I don't think we handle 
correlated subqueries correctly all the time.


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1395
PS23, Line 1395:     AnalyzesOk("select count(1) from functional.alltypes " +
> Also add a variant with (subquery) IS NULL. That variant is currently forbi
Wasn't sure where to put IS NULL, so I put it two different places.
In the meantime I had to modify IsNullPredicate.java, and noticed a weird 
behavior:

 select count(1)
 from functional.alltypes
 where (select int_col from functional.alltypes) is null

The above query executes succesfully even though the subquery clearly returns 
multiple rows.

But when I use IS NOT NULL:

 select count(1)
 from functional.alltypes
 where (select int_col from functional.alltypes) is not null

Then the query fails with the expected error "Subquery must not return more 
than one row".

And when I refer to a column that is completely NULL, the results are the exact 
opposite, ie. IS NULL yields error and IS NOT NULL executes successfully.

I guess the explanation is that the IS NULL and IS NOT NULL predicates are 
pushed down to the scan node of the subquery.

I think the behavior is OK if we use the argument we already used before: "we 
only deduce information from the query statement, so the statement as a whole 
is correct".


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java:

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java@82
PS23, Line 82:       if (node instanceof StatementBase) {
> No need for this. We're always going to have a StatementBase
Switched to QueryStmt as you suggested.


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java@917
PS23, Line 917:     //TODO: make it work again or delete
              : //    testToSql("select * from functional.alltypes where not 
(id < 10 and " +
              : //            "(int_col in (select int_col from 
functional.alltypestiny)) and " +
              : //            "(string_col = (select max(string_col) from 
functional.alltypestiny)))",
              : //        "SELECT * FROM functional.alltypes WHERE NOT (id < 10 
AND " +
              : //            "(int_col IN (SELECT int_col FROM 
functional.alltypestiny)) AND " +
              : //        "(string_col = (SELECT max(string_col) FROM 
functional.alltypestiny)))");
> Please file an improvement JIRA for this case, and remove the test.
Removed the test and filed IMPALA-6840.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 23
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Thu, 12 Apr 2018 16:39:05 +0000
Gerrit-HasComments: Yes

Reply via email to