GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/11755

    [SPARK-13926] Automatically use Kryo serializer when shuffling RDDs with 
simple types

    Because ClassTags are available when constructing ShuffledRDD we can use 
them to automatically use Kryo for shuffle serialization when the RDD's types 
are guaranteed to be compatible with Kryo.
    
    This patch introduces `SerializerManager`, a component which picks the 
"best" serializer for a shuffle given the elements' ClassTags. It will 
automatically pick a Kryo serializer for ShuffledRDDs whose key, value, and/or 
combiner types are primitives, arrays of primitives, or strings. In the future, 
we can use this class as a narrow extension point for integrating specialized 
serializers for other types, such as ByteBuffers.
    
    In a planned followup patch, I plan to extend the BlockManager APIs so that 
we're able to use similar automatic serializer-selection when caching RDDs 
(this is a little trickier because the ClassTags need to be threaded through 
many more places).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark 
automatically-pick-best-serializer

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11755.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11755
    
----
commit 035f227a93c2d69f03d0abdd5701245e1962a8f4
Author: Josh Rosen <[email protected]>
Date:   2016-03-08T07:36:33Z

    Remove Serializer.getSerializer()

commit 35b32b3150327fc4cf50123211abfcef4d9bcedb
Author: Josh Rosen <[email protected]>
Date:   2016-03-16T05:55:29Z

    Wire up automatic serializer selection.

commit 876f038b56688357e90b0c5edbfaf6553587f1fd
Author: Josh Rosen <[email protected]>
Date:   2016-03-16T06:15:27Z

    Remove print statements.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to