Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16266 )
Change subject: IMPALA-5022: Outer join simplification ...................................................................... Patch Set 5: (10 comments) Hi Xianqing, thank you so much for this contribution! I'll need to do another pass and go over the tests but here are some initial comments. http://gerrit.cloudera.org:8080/#/c/16266/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16266/5//COMMIT_MSG@17 PS5, Line 17: one null rejecting condition on the inner table. Consider adding a query option like DISABLE_OUTER_TO_INNER_REWRITE so disable this optimization if needed as runtime. http://gerrit.cloudera.org:8080/#/c/16266/5//COMMIT_MSG@34 PS5, Line 34: * Ran the full set of verifications in Impala Public Jenkins Please try out TPC-DS Q49 there the LOJ queries in there should be rewritten. http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3217 PS5, Line 3217: getWhereClauseConjuncts( Technically this would include having clause conjuncts as well, so might be misleading to name this function getWhereClauseConuncts. http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3227 PS5, Line 3227: } As a further optimization you could use getEquivClassesOnTuples() to also check for null filtering conditions that come as a result of a transitive relationship. For example T1 LEFT OUTER JOIN T2 ON (T1.a = T2.a) JOIN T3 ON (T3.b=T2.b) WHERE T3.b > 10; http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3270 PS5, Line 3270: analyzeNoThrow > For some common SQL functions, we probably can directly test their existen Agreed it would be good to have some static expressions that we know won't reject nulls for example. col IS NULL col1 IS DISTINCT FROM col2 for things like IN and COALESCE you would recursively check the children. IF and CASE are trickier so you might want to call the BE or just skip those. http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3427 PS5, Line 3427: // Recompute the graph since we may need to add value-transfer edges based on the See later comment in Planner, but it might be better to return this and have the caller recompute the graph. http://gerrit.cloudera.org:8080/#/c/16266/4/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/16266/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@980 PS4, Line 980: his instanceof CompoundPredicate : && ((CompoundPredicate) this).getOp() == CompoundPredicate.Operator.OR You could use Expr.IS_OR_PREDICATE(this) here. http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@978 PS5, Line 978: public List<Expr> getDisjunctiveConjuncts() { There is something off about this interface. You assume the caller the first time this is called has verified that the predicate is an OR. For example is someone called this function with just a plan Expr then it would return the Expr back. You might want to move the IS_OR_PREDICATE call from Analyzer.java#3245 into it's own wrapper function, which then calls this method. http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@748 PS5, Line 748: // Transform outer join into inner join whenever possible Might want to use some state in the analyzer to check if any Outer Joins exist in the query and only then call this function then. For example globalstate_.outerJountTupleIds. http://gerrit.cloudera.org:8080/#/c/16266/5/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@749 PS5, Line 749: analyzer.simplifyOuterJoins(selectStmt.getTableRefs()); It would be if you returned some indicator that the value transfer graph needs to be recomputed. Then recompute the graph here so you can make the time line event accordingly. ctx_.getTimeline().markEvent("Recomputing value transfer graph") Also if the SingleNodePlanner's valueTransferGraphNeedsUpdate_ was set to true you could likely reset it after you recompute the graph. -- To view, visit http://gerrit.cloudera.org:8080/16266 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa7804033fac68e93f33c387dc68ef67f803e93e Gerrit-Change-Number: 16266 Gerrit-PatchSet: 5 Gerrit-Owner: Xianqing He <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Shant Hovsepian <[email protected]> Gerrit-Reviewer: Xianqing He <[email protected]> Gerrit-Comment-Date: Tue, 04 Aug 2020 19:22:29 +0000 Gerrit-HasComments: Yes
