GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/10566

    Outer join elimination by parent join predicate

    This PR is another enhancement to Optimizer. It does not conflict with the 
other PRs (https://github.com/apache/spark/pull/10542 and 
https://github.com/apache/spark/pull/10551).  
    
    Given an outer join is involved in another join (called parent join), when 
the join type of the parent join is inner, left-semi, left-outer and 
right-outer, checking if the join condition of the parent join satisfies the 
following two conditions:
      1) there exist null filtering predicates against the columns in the 
null-supplying side of parent join.
      2) these columns are from the child join.
    
    If having such join predicates, execute the elimination rules:
     - full outer -> inner if both sides of the child join have such predicates
     - left outer -> inner if the right side of the child join has such 
predicates
     - right outer -> inner if the left side of the child join has such 
predicates
     - full outer -> left outer if only the left side of the child join has 
such predicates
     - full outer -> right outer if only the right side of the child join has 
such predicates
    
    If applicable, this can greatly improve the performance, since outer join 
is much slower than inner join, full outer join is much slower than left/right 
outer join. 
    
    BTW, since the rule is different from the rule in 
https://github.com/apache/spark/pull/10542, I did not merge them in the same 
one for simplifying the code review. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark 
OuterJoinEliminationByParentJoinPredicate

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10566.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10566
    
----
commit bde74f83e24c2dc9bd9fd9e5541362049594c972
Author: gatorsmile <[email protected]>
Date:   2016-01-03T17:45:38Z

    Merge remote-tracking branch 'upstream/master' into 
OuterJoinEliminationByParentJoinPredicate

commit e18ba758aa94cc75115cc689f49b75ccd5d0ce51
Author: gatorsmile <[email protected]>
Date:   2016-01-04T02:21:59Z

    Merge remote-tracking branch 'upstream/master' into 
OuterJoinEliminationByParentJoinPredicate

commit d6a6e9cc31b0f7547b35cf25884135ea65b03676
Author: gatorsmile <[email protected]>
Date:   2016-01-04T02:40:26Z

    outer join elimination by parent join.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to