Fernando Pereira created SPARK-20580:
----------------------------------------

             Summary: Allow RDD cache with unserializable objects
                 Key: SPARK-20580
                 URL: https://issues.apache.org/jira/browse/SPARK-20580
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 1.3.0
            Reporter: Fernando Pereira
            Priority: Minor


In my current scenario we load complex Python objects in the worker nodes that 
are not completely serializable. We then apply map certain operations to the 
RDD which at some point we collect. In this basic usage all works well.

However, if we cache() the RDD (which defaults to memory) suddenly it fails to 
execute the transformations after the caching step. Apparently caching 
serializes the RDD data and deserializes it whenever more transformations are 
required.

It would be nice to avoid serialization of the objects if they are to be cached 
to memory, and keep the original object



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to