GitHub user ioana-delaney opened a pull request:

    https://github.com/apache/spark/pull/15363

    [SPARK-17791][SQL] Join reordering using star schema detection

    ## What changes were proposed in this pull request?
    
    Star schema consists of one or more fact tables referencing a number of 
dimension tables. In general, queries against star schema are expected to run 
fast because of the established RI constraints among the tables. This design 
proposes a join reordering based on natural, generally accepted heuristics for 
star schema queries:
    - Finds the star join with the largest fact table and places it on the 
driving arm of the left-deep join. This plan avoids large tables on the inner, 
and thus favors hash joins. 
    - Applies the most selective dimensions early in the plan to reduce the 
amount of data flow.
    
    The design document was included in SPARK-17791.
    
    ## How was this patch tested?
    
    A new test suite StarJoinSuite.scala was implemented.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ioana-delaney/spark starJoinReord2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15363.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15363
    
----
commit 518d8e5e66925f60cce4db8a4924ff89ead84c0a
Author: Ioana Delaney <[email protected]>
Date:   2016-10-05T22:27:35Z

    [SPARK-17791] Join reordering using star schema detection.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to