Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8322 )

Change subject: IMPALA-1422: support a constant on LHS of IN predicates.
......................................................................


Patch Set 6:

(12 comments)

Some high-level comments before I dig deeper.

http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@94
PS2, Line 94:     // TODO: derived slot refs (e.g., star-expanded) will not 
have rawPath set.
Why can't this be addressed in this patch?


http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@287
PS2, Line 287:    * rewritten, null is returned. If 'inPred' is rewritten, the 
resulting expression
... and the RHS is a subquery.


http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@299
PS2, Line 299:    * 2) Predicate is NOT IN and RHS returns a single row.
What if the RHS subquery is correlated?


http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@319
PS2, Line 319:    *     NOT EXISTS (RHS AND (C IS NULL OR (CASE WHEN e IS NULL 
THEN C ELSE e END) = C)
Is the transformation even correct if RHS is correlated?


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@312
PS6, Line 312:    *    a) No group by or analytic function in subquery.
Ideally, we should not have to distinguish between cases 3a and 3b, and we 
should always do this rewrite:

C NOT IN (RHS)

Rewrites to:

NOT EXISTS (SELECT 1 FROM (RHS) v WHERE C IS NULL OR IFNULL(v.e, C) = C)

My understanding is that such a rewrite dues not work for case 3a when there is 
correlation, so we use an alternate rewrite that does not require a new inline 
view. I think we should reorganize the comments here to reflect this line of 
thinking and the limitations. For example, something like this:


3) Predicate is NOT IN and RHS returns multiple rows.

General rewrite: <the generic rewrite using inline view>
<briefly explain why we cannot use this rewrite to motivate cases a and b below>

a) No group by or analytic
<explain why the general rewrite does not work>

b) Group by  or analytic function in subquery
<explain why the general rewrite can be simplified>


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@316
PS6, Line 316:    *    REWRITE:
Use IFNULL() instead of CASE to simplify.


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@319
PS6, Line 319:    *     NOT EXISTS (RHS AND (C IS NULL OR (CASE WHEN e IS NULL 
THEN C ELSE e END) = C)
I think there's another case where this rewrite is not correct: when the 
subquery has an order by + limit. Basically, we are reimplementing the 
"predicate push down" correctness checks here.

I'm wondering whether we should reduce the scope of this patch and only allow 
those case where the generic rewrite works. In a separate JIRA we should 
consider whether we can make the generic rewrite also work for case 3a, and if 
not what to do instead.

What do you think?


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@321
PS6, Line 321:    *    Example:
The following query gives me an IllegalStateException:

explain select id from functional.alltypes t1 where 10 not in (select id from 
functional.alltypestiny order by id limit 2);

Existing bug or related to this change?


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@328
PS6, Line 328:    *    Note: it useful to think of C NOT IN (RHS) as the 
boolean expansion:
it is useful


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@340
PS6, Line 340:    *    b) Group by  or analytic function in subquery.
extra space after "by"


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@345
PS6, Line 345:    *      NOT EXISTS (SELECT x FROM (RHS) t WHERE C = t or t IS 
NOT NULL)
What does C=t mean here?
"t" is a table alias and cannot be compared to a C

It's not immediately clear to me why the rewrite here is equivalent to the 
generic rewrite described in 3a. Please explain (see my suggestion above). 
Alternatively, I'm also ok if you want to just use the generic rewrite here.


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@354
PS6, Line 354:    *     - Assumes that subquery is uncorrelated, which is 
currently a requirement.
What does this note refer to? Above you state that "All cases apply to both 
correlated and uncorrelated subqueries"



--
To view, visit http://gerrit.cloudera.org:8080/8322
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb
Gerrit-Change-Number: 8322
Gerrit-PatchSet: 6
Gerrit-Owner: Vuk Ercegovac <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Comment-Date: Sat, 04 Nov 2017 00:17:47 +0000
Gerrit-HasComments: Yes

Reply via email to