Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/755#discussion_r13165029
  
    --- Diff: python/pyspark/context.py ---
    @@ -51,6 +51,7 @@ class SparkContext(object):
         _active_spark_context = None
         _lock = Lock()
         _python_includes = None # zip and egg files that need to be added to 
PYTHONPATH
    +    _pickle_file_serializer = BatchedSerializer(PickleSerializer(), 1024)
    --- End diff --
    
    You should make batches smaller than 1024 by default, because some objects 
users work with might be very large. I'd set it to only 10. If you'd like, you 
can add an optional batchSize parameter to RDD.saveAsPickleFile.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to