GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/2722
[SPARK-3861][SQL] Avoid rebuilding hash tables on each partition
BroadcastHashJoin builds a new hash table for each partition. We can build
it once per node and reuse the hash table.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/rxin/spark SPARK-3861-broadcast-hash
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/2722.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2722
----
commit a39be8c06d3397ff834b1defad37ce1ca1824183
Author: Reynold Xin <[email protected]>
Date: 2014-10-08T22:22:34Z
[SPARK-3857] Create a join package for various join operators.
commit a070d44aa31a6af4cd8d45fc2c02adef61bb03b9
Author: Reynold Xin <[email protected]>
Date: 2014-10-08T22:26:52Z
Fix line length in HashJoin
commit cbc664c87c2b0e6437990ca09c8771e34d9816e3
Author: Reynold Xin <[email protected]>
Date: 2014-10-08T23:52:11Z
Rename join -> joins package.
commit 0c0082b5d656a57dee41d97f69a212d36a3c3533
Author: Reynold Xin <[email protected]>
Date: 2014-10-08T23:55:39Z
Fix line length.
commit 90b58c0aed328a329d96bed32b4293f3ac3a208b
Author: Reynold Xin <[email protected]>
Date: 2014-10-08T23:56:54Z
[SPARK-3861] Avoid rebuilding hash tables on each partition
BroadcastHashJoin builds a new hash table for each partition. We can build
it once per node and reuse the hash table.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]