GitHub user falaki opened a pull request:

    https://github.com/apache/spark/pull/1595

    [Core][SPARK-2696] Reduce default value of 
spark.serializer.objectStreamReset

    The current default value of spark.serializer.objectStreamReset is 10,000. 
    When trying to re-partition (e.g., to 64 partitions) a large file (e.g., 
500MB), containing 1MB records, the serializer will cache 10000 x 1MB x 64 ~= 
640 GB which will cause out of memory errors.
    
    This patch sets the default value to a more reasonable default value (100).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/falaki/spark objectStreamReset

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1595.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1595
    
----
commit 1aa0df87db69d3c814b827e27673b198acf49edb
Author: Hossein <[email protected]>
Date:   2014-07-25T22:56:06Z

    Reduce default value of spark.serializer.objectStreamReset

commit 650a935cdd810fe7bbc43555ad126cb2bebaab92
Author: Hossein <[email protected]>
Date:   2014-07-25T23:05:05Z

    Updated documentation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to