GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/11801

    [SPARK-13990] Automatically pick serializer when caching RDDs

    Building on the `SerializerManager` introduced in SPARK-13926/ #11755, this 
patch Spark modifies Spark's BlockManager to use RDD's ClassTags in order to 
select the best serializer to use when caching RDD blocks.
    
    When storing a local block, the BlockManager `put()` methods use implicits 
to record ClassTags and stores those tags in the blocks' BlockInfo records. 
When reading a local block, the stored ClassTag is used to pick the appropriate 
serializer. When a block is stored with replication, the class tag is written 
into the block transfer metadata and will also be stored in the remote 
BlockManager.
    
    There are two or three places where we don't properly pass ClassTags, 
including TorrentBroadcast and BlockRDD. I think this happens to work because 
the missing ClassTag always happens to be `ClassTag.Any`, but it might be worth 
looking more carefully at those places to see whether we should be more 
explicit.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark 
pick-best-serializer-for-caching

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11801.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11801
    
----
commit ba322045de94df8ae41d2a25e8f6fa34f5b5c089
Author: Josh Rosen <[email protected]>
Date:   2016-03-17T20:02:54Z

    Add ClassTags to BlockInfo.

commit f22f8ee16f7212178c83ad2c7a22c767dee4fa63
Author: Josh Rosen <[email protected]>
Date:   2016-03-17T21:09:11Z

    Construct BlockManager with a SerializerManager

commit c30c6ee4905e3b836ce231e6a506ba11faad1f12
Author: Josh Rosen <[email protected]>
Date:   2016-03-17T21:59:28Z

    Propagate ClassTags in a bunch more places.

commit 359fb7efea5ce06c3a43e67013f585067dc9cf4b
Author: Josh Rosen <[email protected]>
Date:   2016-03-17T22:39:30Z

    Propagate class tags during block replication.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to