[GitHub] spark pull request: [SPARK-3084] [SQL] Collect broadcasted tables ...

mateiz Sat, 16 Aug 2014 19:11:36 -0700

GitHub user mateiz opened a pull request:

    https://github.com/apache/spark/pull/1990


    [SPARK-3084] [SQL] Collect broadcasted tables in parallel in joins

    BroadcastHashJoin has a broadcastFuture variable that tries to collect
    the broadcasted table in a separate thread, but this doesn't help
    because it's a lazy val that only gets initialized when you attempt to
    build the RDD. Thus queries that broadcast multiple tables would collect
    and broadcast them sequentially. I changed this to a val to let it start
    collecting right when the operator is created.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mateiz/spark spark-3084

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1990.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1990
    
----
commit f468766e2051f323ed81ecc53c27bed7becdc9b1
Author: Matei Zaharia <[email protected]>
Date:   2014-08-17T02:09:34Z

    [SPARK-3084] Collect broadcasted tables in parallel in joins
    
    BroadcastHashJoin has a broadcastFuture variable that tries to collect
    the broadcasted table in a separate thread, but this doesn't help
    because it's a lazy val that only gets initialized when you attempt to
    build the RDD. Thus queries that broadcast multiple tables would collect
    and broadcast them sequentially. I changed this to a val to let it start
    collecting right when the operator is created.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-3084] [SQL] Collect broadcasted tables ...

Reply via email to