Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9005 )

Change subject: IMPALA-6314: Add run time scalar subquery check for 
uncorrelated subqueries
......................................................................


Patch Set 23:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/StatementBase.java
File fe/src/main/java/org/apache/impala/analysis/StatementBase.java:

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@43
PS23, Line 43:   protected String origSqlString_ = null;
> OK, moved it to QueryStmt.
Thinking about this again, you right that your previous code set the 
origSqlString correctly for all statements. Still, I think it's clearer to have 
it only where it's actually needed. For most statements it's just equivalent to 
toSql() and the assignment in analysis context is confusing (and not tested).


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java:

http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1174
PS23, Line 1174:     AnalyzesOk("select id from functional.alltypestiny t where 
int_col < " +
> No, we don't have such tests, and currently the following query is allowed:
My understanding is that we're not tackling correlated subqueries in this 
patch. The plan generated for that query is currently wrong, so we should fail 
it during analysis/rewriting.

Here's an example from Postgres that shows why the Impala plan generated by 
this patch is wrong:

Query succeeds:
create table a (c1 int, c2 int);
create table b (c1 int, c2 int);
insert into a values(1,1),(2,2);
insert intl b values(1,1),(2,2);
select * from a x where x.c1 = (select y.c1 from b y where x.c2 = y.c2);

Query fails at runtime:
create table a (c1 int, c2 int);
create table b (c1 int, c2 int);
insert into a values(1,1),(2,2);
insert intl b values(1,1),(2,2)(2,2);
select * from a x where x.c1 = (select y.c1 from b y where x.c2 = y.c2);
ERROR: more than one row returned by a subquery used as an expression


http://gerrit.cloudera.org:8080/#/c/9005/23/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1395
PS23, Line 1395:     AnalyzesOk("select count(1) from functional.alltypes " +
> Wasn't sure where to put IS NULL, so I put it two different places.
Regardless of what optimizations we apply to make the query fast, we must 
return correct results.

I think the IS NULL variant is should not succeed. I confirmed the expected 
behavior with Postgres. Both IS NULL and IS NOT NULL variants return an error 
"Subquery returns more than one row".

Let's either fix this issue, or forbid these cases and file a JIRA to 
separately fix them.

This makes me wonder whether queries like the following return correct results:
select straight_join count(1) from functional.alltypes a
where (select int_col from functional.alltypes b) > 10;

Basically, when the non-subquery child is a constant we push the predicate down 
which I think is wrong. Same deal. We either need to forbid these cases and fix 
in a separate JIRA or fix them now. Likewise, Postgres returns the same 
"Subquery returns more than one row" for that query.

I think the main problem is that we push the predicate "through a limit" which 
is generally not correct. We might be able to address that in the Planner when 
migrating conjuncts into inline views. Perhaps it's enough to not migrate 
conjuncts if the inline view's stmt is runtime scalar.



--
To view, visit http://gerrit.cloudera.org:8080/9005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f52b93a60eeacedd242a2f17fa6b99c4fc38e06
Gerrit-Change-Number: 9005
Gerrit-PatchSet: 23
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Attila Jeges <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Fri, 13 Apr 2018 16:32:15 +0000
Gerrit-HasComments: Yes

Reply via email to