GitHub user marmbrus opened a pull request:

    https://github.com/apache/spark/pull/734

    [SQL] SPARK-1800 Add broadcast hash join operator

    WIP: A few things remain, but looking for feedback on this approach.
    
     - [ ] Figure out how to configure this.  The immutability of SparkConf is 
probably not great for things like query hints.
     - [ ] Figure out how to test this.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/marmbrus/spark broadcastHashJoin

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/734.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #734
    
----
commit a8420ca0c4cbc5988607d0cd235ffeb2cb51d052
Author: Michael Armbrust <[email protected]>
Date:   2014-05-11T18:23:02Z

    Copy records in executeCollect to avoid issues with mutable rows.

commit cf6b3818fbe7d1908bcbdc7f18c5773c01d05541
Author: Michael Armbrust <[email protected]>
Date:   2014-05-11T18:30:56Z

    Split out generic logic for hash joins and create two concrete physical 
operators: BroadcastHashJoin and ShuffledHashJoin.

commit 76ca4341036b95f71763f631049fdae033990ab5
Author: Michael Armbrust <[email protected]>
Date:   2014-05-11T18:31:20Z

    A simple strategy that broadcasts tables only when they are found in a 
configuration hint.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to