Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16266 )
Change subject: IMPALA-5022: Outer join simplification ...................................................................... Patch Set 13: (13 comments) Some more comments. Thanks a lot for adding the run-time test and explanation. http://gerrit.cloudera.org:8080/#/c/16266/12/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/16266/12/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3483 PS12, Line 3483: if (tableRef.getLeftTblRef() != null) { : boolean ret = simplifyOuterJoinsByIjOnClause(processedTblRefs, tableRef); : isSimplified = isSimplified ? true : ret; Can this IF block to be invoked only when isSimplified is false? http://gerrit.cloudera.org:8080/#/c/16266/12/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3482 PS12, Line 3482: case INNER_JOIN: { : if (tableRef.getLeftTblRef() != null) { : boolean ret = simplifyOuterJoinsByIjOnClause(processedTblRefs, tableRef); : isSimplified = isSimplified ? true : ret; : } : break; > This case optimizes the pattern likes A left join B on A.id=B.id inner join Done http://gerrit.cloudera.org:8080/#/c/16266/13/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/16266/13/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3231 PS13, Line 3231: st<Expr Please add a comment for the new method. http://gerrit.cloudera.org:8080/#/c/16266/13/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3249 PS13, Line 3249: splitDisjunctiveConjuncts It probably is better to call this method collectTupleIdsForConjuncts(Expr e) that returns a Set<TupleId>. In this way, we can do intersection more efficiently and avoid the needs to create an ArrayList for each conjunct at line 3278. For complex queries, the number of expressions can be quite large. http://gerrit.cloudera.org:8080/#/c/16266/13/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3260 PS13, Line 3260: tupleIdIntersect( Please refer to the comment for line 3268, which may discard this method and instead rely on Java Set's operations, such as retainAll(). http://gerrit.cloudera.org:8080/#/c/16266/13/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3268 PS13, Line 3268: // Return fasle if the disjunctive conjunct containing the tuple id not in 'ids'. Sounds like this comment paragraph should be moved above line 3267 to act as the comment for the method. The mention of tuple id and 'ids' should match the arguments to the method. http://gerrit.cloudera.org:8080/#/c/16266/13/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3277 PS13, Line 3277: for (Expr disConjunct : disConjuncts) { : List<TupleId> tids = new ArrayList<>(); : disConjunct.getIds(tids, null); : if (!tupleIdIntersect(tids, tupleIds)) { : return true; : } : } Please refer to comment at line 3249. http://gerrit.cloudera.org:8080/#/c/16266/13/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3352 PS13, Line 3352: If convert an outer join to an inner join nit: reword to "When transforming an outer join to an inner join is feasible" http://gerrit.cloudera.org:8080/#/c/16266/13/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3365 PS13, Line 3365: convert a outer join to a left/right/inner join nit: reword to "When transforming an outer join to a left/right/inner join is feasible" http://gerrit.cloudera.org:8080/#/c/16266/13/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3383 PS13, Line 3383: If convert a outer join to an inner join nit: reword to "When transforming an outer join to an inner join is feasible" http://gerrit.cloudera.org:8080/#/c/16266/12/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: http://gerrit.cloudera.org:8080/#/c/16266/12/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@327 PS12, Line 327: f > In Analyzer#isNullableConjunct will check case expr. The function may be used elsewhere in future. http://gerrit.cloudera.org:8080/#/c/16266/13/testdata/workloads/functional-query/queries/QueryTest/outer-to-inner-joins.test File testdata/workloads/functional-query/queries/QueryTest/outer-to-inner-joins.test: http://gerrit.cloudera.org:8080/#/c/16266/13/testdata/workloads/functional-query/queries/QueryTest/outer-to-inner-joins.test@1 PS13, Line 1: == Can we add some more tests here? 1). select t2.id from functional.alltypes t1 left outer join functional.alltypessmall t2 on t1.id = t2.id where t1.int_col < 10; -- expect no conversion 2). select t2.tinyint_col, count(t2.id) from functional.alltypes t1 left outer join functional.alltypessmall t2 on t1.id = t2.id group by t2.tinyint_col -- expect no conversion having count(t2.id) > 0 3). Add simple tests involving conditional expressions -- expect no conversion 4). select 1 from (values(1))x where 1 in ( select count(t2.id) from functional.alltypes t1 left outer join functional.alltypessmall t2 on t1.id = t2.id where t2.int_col > 10 ) -- should be converted 5) select 1 from functional.alltypes t0, ( select t2.id ct from functional.alltypes t1 left outer join functional.alltypessmall t2 on t1.id = t2.id ) s where t0.int_col = s.ct -- should be converted, at least on paper http://gerrit.cloudera.org:8080/#/c/16266/13/tests/query_test/test_join_queries.py File tests/query_test/test_join_queries.py: http://gerrit.cloudera.org:8080/#/c/16266/13/tests/query_test/test_join_queries.py@96 PS13, Line 96: new_vector.get_value('exec_option')['batch_size'] = vector.get_value('batch_size') : new_vector.get_value('exec_option')['mt_dop'] = vector.get_value('mt_dop') Nice. Would you please repeat this section with the new query option: enable_outer_join_to_inner_transformation set to false? The above query option is very important to shut off the transformation if there is a bug at a production site. -- To view, visit http://gerrit.cloudera.org:8080/16266 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaa7804033fac68e93f33c387dc68ef67f803e93e Gerrit-Change-Number: 16266 Gerrit-PatchSet: 13 Gerrit-Owner: Xianqing He <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Shant Hovsepian <[email protected]> Gerrit-Reviewer: Xianqing He <[email protected]> Gerrit-Comment-Date: Wed, 26 Aug 2020 15:02:01 +0000 Gerrit-HasComments: Yes
