GitHub user gao8658 opened a pull request:

    https://github.com/apache/spark/pull/1127

    Spark SQL add LeftSemiBloomFilterBroadcastJoin

    Hi ,All .
    I want to submit  a join operator called
    LeftSemiBloomFilterBroadcastJoin (LeftSemiJoinBFB)
    Sometimes the Semijoin's broadcast table can't fit memory.So  we can make 
it as Bloomfilter to  reduce the space  and then broadcast it do the mapside  
join .
    Some code  reference HashJoin and BroadcastNestedLoopJoin implementation.
    The bloomfilter  code   use Shark's BloomFilter class implementation.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gao8658/spark patch-2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1127.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1127
    
----
commit 470243cce24bee9a6c49a2f1538932ff3dba8f64
Author: Yanjie Gao <[email protected]>
Date:   2014-06-19T04:40:53Z

    Spark SQL add LeftSemiBloomFilterBroadcastJoin
    
    Hi ,All .
    I want to submit  a join operator called
    LeftSemiBloomFilterBroadcastJoin (LeftSemiJoinBFB)
    Sometimes the Semijoin's broadcast table can't fit memory.So  we can make 
it as Bloomfilter to  reduce the space  and then broadcast it do the mapside  
join .
    Some code  reference HashJoin and BroadcastNestedLoopJoin implementation.
    The bloomfilter  code   use Shark's BloomFilter class implementation.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to