Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 )
Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries ...................................................................... Patch Set 23: (10 comments) http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@425 PS23, Line 425: analysisResult_.stmt_.origSqlString_ = analysisResult_.stmt_.toSql(); > This is pretty weird. See my comment in StatementBase. Removed it from here. Now I'm setting origSqlString_ in SelectStmt.analyze() and UnionStmt.analyze(). http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/StatementBase.java File fe/src/main/java/org/apache/impala/analysis/StatementBase.java: http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@43 PS23, Line 43: protected String origSqlString_ = null; > Why is this here and not in QueryStmt? I think it's cleaner to have it in Q OK, moved it to QueryStmt. So when I set it in AnalysisContext.analyze() after the first analysis phase (before any rewrites), wasn't it correctly set for all the statements? http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/9005/21/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1025 PS21, Line 1025: if (isCorrelated && !item.getExpr().contains(Expr.IS_BUILTIN_AGG_FN)) { > I looked into this and concluded that we should convert the type check to a Turned it into a precondition. To fix the error message I changed the condition in L1032 and introduced a new predicate called IS_UDA_FN. http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@747 PS23, Line 747: cardinalityCheckNode.computeStats(ctx_.getRootAnalyzer()); > Why do we need to computeStats() again? I introduced it in PS2 as some kind of workaround, but it seems we no longer need it. Removed this line. http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java: http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1163 PS23, Line 1163: // Scalar subquery check is done at runtime, not at parse-time > not at parse-time -> not during analysis Done http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1164 PS23, Line 1164: AnalyzesOk("select id from functional.alltypestiny t where int_col = " + > Loop over cmpOperators and have a few different variants inside the loop Created a loop. http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1174 PS23, Line 1174: AnalyzesOk("select id from functional.alltypestiny t where int_col < " + > Do we have a negative test that shows disallowing correlated scalar subquer No, we don't have such tests, and currently the following query is allowed: select * from functional.alltypes t1 where id < (select id from functional.alltypes t2 where t1.int_col = t2.int_col); However, it raises the cardinality error, but I don't think we handle correlated subqueries correctly all the time. http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1395 PS23, Line 1395: AnalyzesOk("select count(1) from functional.alltypes " + > Also add a variant with (subquery) IS NULL. That variant is currently forbi Wasn't sure where to put IS NULL, so I put it two different places. In the meantime I had to modify IsNullPredicate.java, and noticed a weird behavior: select count(1) from functional.alltypes where (select int_col from functional.alltypes) is null The above query executes succesfully even though the subquery clearly returns multiple rows. But when I use IS NOT NULL: select count(1) from functional.alltypes where (select int_col from functional.alltypes) is not null Then the query fails with the expected error "Subquery must not return more than one row". And when I refer to a column that is completely NULL, the results are the exact opposite, ie. IS NULL yields error and IS NOT NULL executes successfully. I guess the explanation is that the IS NULL and IS NOT NULL predicates are pushed down to the scan node of the subquery. I think the behavior is OK if we use the argument we already used before: "we only deduce information from the query statement, so the statement as a whole is correct". http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java: http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java@82 PS23, Line 82: if (node instanceof StatementBase) { > No need for this. We're always going to have a StatementBase Switched to QueryStmt as you suggested. http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java@917 PS23, Line 917: //TODO: make it work again or delete : // testToSql("select * from functional.alltypes where not (id < 10 and " + : // "(int_col in (select int_col from functional.alltypestiny)) and " + : // "(string_col = (select max(string_col) from functional.alltypestiny)))", : // "SELECT * FROM functional.alltypes WHERE NOT (id < 10 AND " + : // "(int_col IN (SELECT int_col FROM functional.alltypestiny)) AND " + : // "(string_col = (SELECT max(string_col) FROM functional.alltypestiny)))"); > Please file an improvement JIRA for this case, and remove the test. Removed the test and filed IMPALA-6840. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 23 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Thu, 12 Apr 2018 16:39:05 +0000 Gerrit-HasComments: Yes