Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16266 )

Change subject: IMPALA-5022: Outer join simplification
......................................................................


Patch Set 13:

(13 comments)

Some more comments. Thanks a lot for adding the run-time test and explanation.

http://gerrit.cloudera.org:8080/#/c/16266/12/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/16266/12/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3483
PS12, Line 3483: if (tableRef.getLeftTblRef() != null) {
               :           boolean ret = 
simplifyOuterJoinsByIjOnClause(processedTblRefs, tableRef);
               :           isSimplified = isSimplified ? true : ret;
Can this IF block to be invoked only when isSimplified is false?


http://gerrit.cloudera.org:8080/#/c/16266/12/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3482
PS12, Line 3482:  case INNER_JOIN: {
               :         if (tableRef.getLeftTblRef() != null) {
               :           boolean ret = 
simplifyOuterJoinsByIjOnClause(processedTblRefs, tableRef);
               :           isSimplified = isSimplified ? true : ret;
               :         }
               :         break;
> This case optimizes the pattern likes A left join B on A.id=B.id inner join
Done


http://gerrit.cloudera.org:8080/#/c/16266/13/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/16266/13/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3231
PS13, Line 3231: st<Expr
Please add a comment for the new method.


http://gerrit.cloudera.org:8080/#/c/16266/13/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3249
PS13, Line 3249: splitDisjunctiveConjuncts
It probably is better to call this method collectTupleIdsForConjuncts(Expr e) 
that returns a Set<TupleId>. In this way, we can do intersection more 
efficiently and avoid the needs to create an ArrayList for each conjunct at 
line 3278.

For complex queries, the number of expressions can be quite large.


http://gerrit.cloudera.org:8080/#/c/16266/13/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3260
PS13, Line 3260: tupleIdIntersect(
Please refer to the comment for line 3268, which may discard this method and 
instead rely on Java Set's operations, such as retainAll().


http://gerrit.cloudera.org:8080/#/c/16266/13/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3268
PS13, Line 3268: // Return fasle if the disjunctive conjunct containing the 
tuple id not in 'ids'.
Sounds like this comment paragraph should be moved above line 3267 to act as 
the comment for the method. The mention of tuple id and 'ids' should match the 
arguments to the method.


http://gerrit.cloudera.org:8080/#/c/16266/13/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3277
PS13, Line 3277:       for (Expr disConjunct : disConjuncts) {
               :         List<TupleId> tids = new ArrayList<>();
               :         disConjunct.getIds(tids, null);
               :         if (!tupleIdIntersect(tids, tupleIds)) {
               :           return true;
               :         }
               :       }
Please refer to comment at line 3249.


http://gerrit.cloudera.org:8080/#/c/16266/13/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3352
PS13, Line 3352: If convert an outer join to an inner join
nit: reword to "When transforming an outer join to an inner join is feasible"


http://gerrit.cloudera.org:8080/#/c/16266/13/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3365
PS13, Line 3365: convert a outer join to a left/right/inner join
nit: reword to "When transforming an outer join to a left/right/inner join is 
feasible"


http://gerrit.cloudera.org:8080/#/c/16266/13/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@3383
PS13, Line 3383: If convert a outer join to an inner join
nit: reword to "When transforming an outer join to an inner join is feasible"


http://gerrit.cloudera.org:8080/#/c/16266/12/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

http://gerrit.cloudera.org:8080/#/c/16266/12/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@327
PS12, Line 327: f
> In Analyzer#isNullableConjunct will check case expr.
The function may be used elsewhere in future.


http://gerrit.cloudera.org:8080/#/c/16266/13/testdata/workloads/functional-query/queries/QueryTest/outer-to-inner-joins.test
File 
testdata/workloads/functional-query/queries/QueryTest/outer-to-inner-joins.test:

http://gerrit.cloudera.org:8080/#/c/16266/13/testdata/workloads/functional-query/queries/QueryTest/outer-to-inner-joins.test@1
PS13, Line 1: ==
Can we add some more tests here?

1). select t2.id from functional.alltypes t1
left outer join functional.alltypessmall t2
  on t1.id = t2.id where t1.int_col < 10;  -- expect no conversion

2). select t2.tinyint_col, count(t2.id)
from functional.alltypes t1
left outer join functional.alltypessmall t2
  on t1.id = t2.id
 group by t2.tinyint_col -- expect no conversion
 having count(t2.id) > 0

3). Add simple tests involving conditional expressions -- expect no conversion

4). select 1 from (values(1))x where 1 in
(
 select count(t2.id)
from functional.alltypes t1
left outer join functional.alltypessmall t2
  on t1.id = t2.id
where t2.int_col > 10
) -- should be converted

5) select 1
from functional.alltypes t0,
(
 select t2.id ct
from functional.alltypes t1
left outer join functional.alltypessmall t2
  on t1.id = t2.id
) s
where t0.int_col = s.ct -- should be converted, at least on paper


http://gerrit.cloudera.org:8080/#/c/16266/13/tests/query_test/test_join_queries.py
File tests/query_test/test_join_queries.py:

http://gerrit.cloudera.org:8080/#/c/16266/13/tests/query_test/test_join_queries.py@96
PS13, Line 96: new_vector.get_value('exec_option')['batch_size'] = 
vector.get_value('batch_size')
             :     new_vector.get_value('exec_option')['mt_dop'] = 
vector.get_value('mt_dop')
Nice.

Would you please repeat this section with the new query option: 
enable_outer_join_to_inner_transformation set to false?


The above query option is very important to shut off the transformation if 
there is a bug at a production site.



--
To view, visit http://gerrit.cloudera.org:8080/16266
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa7804033fac68e93f33c387dc68ef67f803e93e
Gerrit-Change-Number: 16266
Gerrit-PatchSet: 13
Gerrit-Owner: Xianqing He <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Shant Hovsepian <[email protected]>
Gerrit-Reviewer: Xianqing He <[email protected]>
Gerrit-Comment-Date: Wed, 26 Aug 2020 15:02:01 +0000
Gerrit-HasComments: Yes

Reply via email to