GitHub user ioana-delaney opened a pull request:
https://github.com/apache/spark/pull/15363
[SPARK-17791][SQL] Join reordering using star schema detection
## What changes were proposed in this pull request?
Star schema consists of one or more fact tables referencing a number of
dimension tables. In general, queries against star schema are expected to run
fast because of the established RI constraints among the tables. This design
proposes a join reordering based on natural, generally accepted heuristics for
star schema queries:
- Finds the star join with the largest fact table and places it on the
driving arm of the left-deep join. This plan avoids large tables on the inner,
and thus favors hash joins.
- Applies the most selective dimensions early in the plan to reduce the
amount of data flow.
The design document was included in SPARK-17791.
## How was this patch tested?
A new test suite StarJoinSuite.scala was implemented.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ioana-delaney/spark starJoinReord2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15363.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15363
----
commit 518d8e5e66925f60cce4db8a4924ff89ead84c0a
Author: Ioana Delaney <[email protected]>
Date: 2016-10-05T22:27:35Z
[SPARK-17791] Join reordering using star schema detection.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]