GitHub user JoshRosen opened a pull request:
https://github.com/apache/spark/pull/11801
[SPARK-13990] Automatically pick serializer when caching RDDs
Building on the `SerializerManager` introduced in SPARK-13926/ #11755, this
patch Spark modifies Spark's BlockManager to use RDD's ClassTags in order to
select the best serializer to use when caching RDD blocks.
When storing a local block, the BlockManager `put()` methods use implicits
to record ClassTags and stores those tags in the blocks' BlockInfo records.
When reading a local block, the stored ClassTag is used to pick the appropriate
serializer. When a block is stored with replication, the class tag is written
into the block transfer metadata and will also be stored in the remote
BlockManager.
There are two or three places where we don't properly pass ClassTags,
including TorrentBroadcast and BlockRDD. I think this happens to work because
the missing ClassTag always happens to be `ClassTag.Any`, but it might be worth
looking more carefully at those places to see whether we should be more
explicit.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JoshRosen/spark
pick-best-serializer-for-caching
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11801.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11801
----
commit ba322045de94df8ae41d2a25e8f6fa34f5b5c089
Author: Josh Rosen <[email protected]>
Date: 2016-03-17T20:02:54Z
Add ClassTags to BlockInfo.
commit f22f8ee16f7212178c83ad2c7a22c767dee4fa63
Author: Josh Rosen <[email protected]>
Date: 2016-03-17T21:09:11Z
Construct BlockManager with a SerializerManager
commit c30c6ee4905e3b836ce231e6a506ba11faad1f12
Author: Josh Rosen <[email protected]>
Date: 2016-03-17T21:59:28Z
Propagate ClassTags in a bunch more places.
commit 359fb7efea5ce06c3a43e67013f585067dc9cf4b
Author: Josh Rosen <[email protected]>
Date: 2016-03-17T22:39:30Z
Propagate class tags during block replication.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]