GitHub user ankurdave opened a pull request:

    https://github.com/apache/spark/pull/9089

    [SPARK-11077] [SQL] Join elimination in Catalyst

    Join elimination is a query optimization where certain joins can be 
eliminated when followed by projections that only keep columns from one side of 
the join, and when certain columns are known to be unique or foreign keys. This 
can be very useful for queries involving views and machine-generated queries.
    
    This PR adds join elimination by (1) supporting unique and foreign key 
hints in logical plans, (2) adding methods in the DataFrame API to let users 
provide these hints, and (3) adding an optimizer rule that eliminates unique 
key outer joins and referential integrity joins when followed by an appropriate 
projection.
    
    This change is described in detail here: 
https://docs.google.com/document/d/1-YgQSQywHfAo4PhAT-zOOkFZtVcju99h3dYQq-i9GWQ/edit?usp=sharing

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ankurdave/spark SPARK-11077-JoinElimination

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9089.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9089
    
----
commit 4f528770ecf4a2ae780d6514fdc8c5e7cf899288
Author: Ankur Dave <[email protected]>
Date:   2015-08-04T05:33:59Z

    Eliminate outer join before project

commit ae46ab0891e974f6491d4b266f08d95d7a1c1382
Author: Ankur Dave <[email protected]>
Date:   2015-08-12T20:15:50Z

    Use KeyHint to do join elimination

commit df9ef1421cee2f8f94dac24a8116ad504a009a20
Author: Ankur Dave <[email protected]>
Date:   2015-08-12T23:25:30Z

    Add foreign keys

commit b22f7025860fed1b3f7bd5147691f5ef887bca01
Author: Ankur Dave <[email protected]>
Date:   2015-08-13T02:49:26Z

    Alias-aware join elimination + bugfixes

commit 9072cb70872b156027cb2e673a397cc01f326128
Author: Ankur Dave <[email protected]>
Date:   2015-08-13T03:22:55Z

    Propagate foreign keys through Join operator

commit f430ea2c6413879403973fc4fdd4217dde9d27ec
Author: Ankur Dave <[email protected]>
Date:   2015-08-13T03:43:06Z

    Remove key hints after join elimination

commit 130253101f2db627c42ea4f8759dfeef6c62e574
Author: Ankur Dave <[email protected]>
Date:   2015-08-17T01:55:36Z

    Support inner joins based on referential integrity

commit 35949f54c53357a86e0a2e2aeb0e5524a8285ce5
Author: Ankur Dave <[email protected]>
Date:   2015-08-18T06:38:30Z

    Correctness fixes for join elimination
    
    Do not eliminate referential integrity full outer joins, or inner joins 
where foreign key is
    nullable. Require foreign keys to reference unique columns.

commit 945e5231e900621c4a2bbf103816385d68abd5e0
Author: Ankur Dave <[email protected]>
Date:   2015-08-19T06:15:31Z

    Do key hint resolution during analysis
    
    This is necessary to support aliased self joins and multiple foreign keys 
with the same referent.

commit 504c9d858b8b35ed788e31bf99fc5f6506be792d
Author: Ankur Dave <[email protected]>
Date:   2015-08-19T06:18:02Z

    Don't crash when foreign key refers to unresolved relation
    
    Instead just leave the KeyHint unresolved.

commit 83c8ff913dc06f79ce059906e62b0e744967c1e4
Author: Ankur Dave <[email protected]>
Date:   2015-08-19T07:42:04Z

    Fix JoinEliminationSuite

commit 0b0b8401f97bf52dabacfa818fa62a4477ca4c72
Author: Ankur Dave <[email protected]>
Date:   2015-08-19T11:01:43Z

    Merge remote-tracking branch 'apache-spark/master' into GraphFrames

commit 9150ddaf2d598314ff3ea1fe4a434de37325d213
Author: Ankur Dave <[email protected]>
Date:   2015-08-19T12:14:53Z

    Fix KeyHintSuite after merge

commit 873b3224b043875718959c645146743ed78084da
Author: Ankur Dave <[email protected]>
Date:   2015-10-13T01:47:47Z

    In ForeignKey, store referencedRelation as logical plan
    
    Previously we stored its name as part of referencedAttr, requiring a
    catalog lookup.

commit 98e0b5e316b1692a188dedc6b49daaa5854a064b
Author: Ankur Dave <[email protected]>
Date:   2015-10-13T02:45:21Z

    Use semanticEquals for Attributes

commit d43a2c005b091e571a9d5dc3cc7d22e22a29ffd0
Author: Ankur Dave <[email protected]>
Date:   2015-10-13T03:37:35Z

    Remove TODOs

commit f4e7e0140865df27f3c0b000f22d69117316070e
Author: Ankur Dave <[email protected]>
Date:   2015-10-13T04:02:02Z

    Add more comments

commit 49b196e041c80c83eef0b069c984e608cc6433b5
Author: Ankur Dave <[email protected]>
Date:   2015-10-13T04:13:46Z

    Merge remote-tracking branch 'apache-spark/master' into GraphFrames

commit 578797c456e20d0fb07bf10cb3e64f09065948f9
Author: Ankur Dave <[email protected]>
Date:   2015-10-13T04:38:46Z

    Use SharedSQLContext in KeyHintSuite

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to