GitHub user hvanhovell opened a pull request:
https://github.com/apache/spark/pull/11083
[SPARK-13136][SQL] Create a dedicated Broadcast exchange operator [WIP]
Quite a few Spark SQL join operators broadcast one side of the join to all
nodes. The are a few problems with this:
- This conflates broadcasting (a data exchange) with joining. Data
exchanges should be managed by a different operator.
- All these nodes implement their own (duplicate) broadcasting logic.
- Re-use of often used indices is quite hard.
This PR defines a ```Broadcast``` as a unique kind of ```Distribution```.
To match this distribution we implement a ```Broadcast``` operator and have
```EnsureRequirements``` plan this operator.
TODO's:
- [ ] Fix code generation.
- [ ] Add other broadcasting operators.
cc @rxin @davies
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/hvanhovell/spark SPARK-13136
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11083.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11083
----
commit aa7120e0cd8b40a9d0b3edf7c33f18a530d597bc
Author: Herman van Hovell <[email protected]>
Date: 2016-02-04T19:13:42Z
Initial Broadcast design
commit c2b7533f1fb30e9d93856adf4cef4107945670cc
Author: Herman van Hovell <[email protected]>
Date: 2016-02-04T21:34:34Z
Fix Exchange and initial code gen attempt.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]