Github user aarondav commented on the pull request:
https://github.com/apache/spark/pull/5433#issuecomment-91358635
Note that this is only used for BroadcastHashJoins, and this is the
broadcasted table. It is thus expected to be relatively small (< 10 MB). For
Spark users with default configuration, the broadcast is serialized using Java
serialization, which it turns out is much slower than Kryo in this case (see
the benchmark).
This turns out to be a pretty significant win for very short O(seconds)
queries, where a large portion of time may be spent in performing the broadcast.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]