GitHub user gatorsmile opened a pull request:
https://github.com/apache/spark/pull/10566
Outer join elimination by parent join predicate
This PR is another enhancement to Optimizer. It does not conflict with the
other PRs (https://github.com/apache/spark/pull/10542 and
https://github.com/apache/spark/pull/10551).
Given an outer join is involved in another join (called parent join), when
the join type of the parent join is inner, left-semi, left-outer and
right-outer, checking if the join condition of the parent join satisfies the
following two conditions:
1) there exist null filtering predicates against the columns in the
null-supplying side of parent join.
2) these columns are from the child join.
If having such join predicates, execute the elimination rules:
- full outer -> inner if both sides of the child join have such predicates
- left outer -> inner if the right side of the child join has such
predicates
- right outer -> inner if the left side of the child join has such
predicates
- full outer -> left outer if only the left side of the child join has
such predicates
- full outer -> right outer if only the right side of the child join has
such predicates
If applicable, this can greatly improve the performance, since outer join
is much slower than inner join, full outer join is much slower than left/right
outer join.
BTW, since the rule is different from the rule in
https://github.com/apache/spark/pull/10542, I did not merge them in the same
one for simplifying the code review.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gatorsmile/spark
OuterJoinEliminationByParentJoinPredicate
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/10566.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #10566
----
commit bde74f83e24c2dc9bd9fd9e5541362049594c972
Author: gatorsmile <[email protected]>
Date: 2016-01-03T17:45:38Z
Merge remote-tracking branch 'upstream/master' into
OuterJoinEliminationByParentJoinPredicate
commit e18ba758aa94cc75115cc689f49b75ccd5d0ce51
Author: gatorsmile <[email protected]>
Date: 2016-01-04T02:21:59Z
Merge remote-tracking branch 'upstream/master' into
OuterJoinEliminationByParentJoinPredicate
commit d6a6e9cc31b0f7547b35cf25884135ea65b03676
Author: gatorsmile <[email protected]>
Date: 2016-01-04T02:40:26Z
outer join elimination by parent join.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]