GitHub user gao8658 opened a pull request:
https://github.com/apache/spark/pull/1127
Spark SQL add LeftSemiBloomFilterBroadcastJoin
Hi ,All .
I want to submit a join operator called
LeftSemiBloomFilterBroadcastJoin (LeftSemiJoinBFB)
Sometimes the Semijoin's broadcast table can't fit memory.So we can make
it as Bloomfilter to reduce the space and then broadcast it do the mapside
join .
Some code reference HashJoin and BroadcastNestedLoopJoin implementation.
The bloomfilter code use Shark's BloomFilter class implementation.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gao8658/spark patch-2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1127.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1127
----
commit 470243cce24bee9a6c49a2f1538932ff3dba8f64
Author: Yanjie Gao <[email protected]>
Date: 2014-06-19T04:40:53Z
Spark SQL add LeftSemiBloomFilterBroadcastJoin
Hi ,All .
I want to submit a join operator called
LeftSemiBloomFilterBroadcastJoin (LeftSemiJoinBFB)
Sometimes the Semijoin's broadcast table can't fit memory.So we can make
it as Bloomfilter to reduce the space and then broadcast it do the mapside
join .
Some code reference HashJoin and BroadcastNestedLoopJoin implementation.
The bloomfilter code use Shark's BloomFilter class implementation.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---