Shant Hovsepian has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16266 )

Change subject: IMPALA-5022: Outer join simplification
......................................................................


Patch Set 5:

(10 comments)

Hi Xianqing, thank you so much for this contribution! I'll need to do another 
pass and go over the tests but here are some initial comments.

http://gerrit.cloudera.org:8080/#/c/16266/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16266/5//COMMIT_MSG@17
PS5, Line 17: one null rejecting condition on the inner table.
Consider adding a query option like DISABLE_OUTER_TO_INNER_REWRITE so disable 
this optimization if needed as runtime.


http://gerrit.cloudera.org:8080/#/c/16266/5//COMMIT_MSG@34
PS5, Line 34: * Ran the full set of verifications in Impala Public Jenkins
Please try out TPC-DS Q49 there the LOJ queries in there should be rewritten.


http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3217
PS5, Line 3217: getWhereClauseConjuncts(
Technically this would include having clause conjuncts as well, so might be 
misleading to name this function getWhereClauseConuncts.


http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3227
PS5, Line 3227:       }
As a further optimization you could use getEquivClassesOnTuples() to also check 
for null filtering conditions that come as a result of a transitive 
relationship.

For example T1 LEFT OUTER JOIN T2 ON (T1.a = T2.a) JOIN T3 ON (T3.b=T2.b) WHERE 
T3.b > 10;


http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3270
PS5, Line 3270: analyzeNoThrow
> For some common SQL functions,  we probably can directly test their existen
Agreed it would be good to have some static expressions that we know won't 
reject nulls for example.

col IS NULL
col1 IS DISTINCT FROM col2

for things like IN and COALESCE you would recursively check the children.

IF and CASE are trickier so you might want to call the BE or just skip those.


http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3427
PS5, Line 3427:     // Recompute the graph since we may need to add 
value-transfer edges based on the
See later comment in Planner, but it might be better to return this and have 
the caller recompute the graph.


http://gerrit.cloudera.org:8080/#/c/16266/4/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/16266/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@980
PS4, Line 980: his instanceof CompoundPredicate
             :         && ((CompoundPredicate) this).getOp() == 
CompoundPredicate.Operator.OR
You could use Expr.IS_OR_PREDICATE(this) here.


http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@978
PS5, Line 978:   public List<Expr> getDisjunctiveConjuncts() {
There is something off about this interface. You assume the caller the first 
time this is called has verified that the predicate is an OR. For example is 
someone called this function with just a plan Expr then it would return the 
Expr back.

You might want to move the IS_OR_PREDICATE call from Analyzer.java#3245 into 
it's own wrapper function, which then calls this method.


http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@748
PS5, Line 748:     // Transform outer join into inner join whenever possible
Might want to use some state in the analyzer to check if any Outer Joins exist 
in the query and only then call this function then. For example 
globalstate_.outerJountTupleIds.


http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@749
PS5, Line 749:     analyzer.simplifyOuterJoins(selectStmt.getTableRefs());
It would be if you returned some indicator that the value transfer graph needs 
to be recomputed. Then recompute the graph here so you can make the time line 
event accordingly.

ctx_.getTimeline().markEvent("Recomputing value transfer graph")

Also if the SingleNodePlanner's valueTransferGraphNeedsUpdate_ was set to true 
you could likely reset it after you recompute the graph.



--
To view, visit http://gerrit.cloudera.org:8080/16266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa7804033fac68e93f33c387dc68ef67f803e93e
Gerrit-Change-Number: 16266
Gerrit-PatchSet: 5
Gerrit-Owner: Xianqing He <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Shant Hovsepian <[email protected]>
Gerrit-Reviewer: Xianqing He <[email protected]>
Gerrit-Comment-Date: Tue, 04 Aug 2020 19:22:29 +0000
Gerrit-HasComments: Yes

Reply via email to