[GitHub] spark pull request: [SQL] [SPARK-6794] Use kryo-based SparkSqlSeri...

aarondav Thu, 09 Apr 2015 14:20:43 -0700

Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/5433#issuecomment-91358635
  
    Note that this is only used for BroadcastHashJoins, and this is the 
broadcasted table. It is thus expected to be relatively small (< 10 MB). For 
Spark users with default configuration, the broadcast is serialized using Java 
serialization, which it turns out is much slower than Kryo in this case (see 
the benchmark).
    
    This turns out to be a pretty significant win for very short O(seconds) 
queries, where a large portion of time may be spent in performing the broadcast.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SQL] [SPARK-6794] Use kryo-based SparkSqlSeri...

Reply via email to