Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9005 )
Change subject: IMPALA-6314: Add run time scalar subquery check for uncorrelated subqueries ...................................................................... Patch Set 23: (3 comments) http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/StatementBase.java File fe/src/main/java/org/apache/impala/analysis/StatementBase.java: http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@43 PS23, Line 43: protected String origSqlString_ = null; > OK, moved it to QueryStmt. Thinking about this again, you right that your previous code set the origSqlString correctly for all statements. Still, I think it's clearer to have it only where it's actually needed. For most statements it's just equivalent to toSql() and the assignment in analysis context is confusing (and not tested). http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java: http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1174 PS23, Line 1174: AnalyzesOk("select id from functional.alltypestiny t where int_col < " + > No, we don't have such tests, and currently the following query is allowed: My understanding is that we're not tackling correlated subqueries in this patch. The plan generated for that query is currently wrong, so we should fail it during analysis/rewriting. Here's an example from Postgres that shows why the Impala plan generated by this patch is wrong: Query succeeds: create table a (c1 int, c2 int); create table b (c1 int, c2 int); insert into a values(1,1),(2,2); insert intl b values(1,1),(2,2); select * from a x where x.c1 = (select y.c1 from b y where x.c2 = y.c2); Query fails at runtime: create table a (c1 int, c2 int); create table b (c1 int, c2 int); insert into a values(1,1),(2,2); insert intl b values(1,1),(2,2)(2,2); select * from a x where x.c1 = (select y.c1 from b y where x.c2 = y.c2); ERROR: more than one row returned by a subquery used as an expression http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1395 PS23, Line 1395: AnalyzesOk("select count(1) from functional.alltypes " + > Wasn't sure where to put IS NULL, so I put it two different places. Regardless of what optimizations we apply to make the query fast, we must return correct results. I think the IS NULL variant is should not succeed. I confirmed the expected behavior with Postgres. Both IS NULL and IS NOT NULL variants return an error "Subquery returns more than one row". Let's either fix this issue, or forbid these cases and file a JIRA to separately fix them. This makes me wonder whether queries like the following return correct results: select straight_join count(1) from functional.alltypes a where (select int_col from functional.alltypes b) > 10; Basically, when the non-subquery child is a constant we push the predicate down which I think is wrong. Same deal. We either need to forbid these cases and fix in a separate JIRA or fix them now. Likewise, Postgres returns the same "Subquery returns more than one row" for that query. I think the main problem is that we push the predicate "through a limit" which is generally not correct. We might be able to address that in the Planner when migrating conjuncts into inline views. Perhaps it's enough to not migrate conjuncts if the inline view's stmt is runtime scalar. -- To view, visit http://gerrit.cloudera.org:8080/9005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06 Gerrit-Change-Number: 9005 Gerrit-PatchSet: 23 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 13 Apr 2018 16:32:15 +0000 Gerrit-HasComments: Yes
