[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

davies Wed, 29 Oct 2014 14:10:45 -0700

GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/3003


    [SPARK-3466] Limit size of results that a driver collects for each action

    Right now, operations like collect() and take() can crash the driver with 
an OOM if they bring back too many data.
    
    This PR will introduce spark.driver.maxResultSize, after setting it, the 
driver will abort a job if its result is bigger than it.
    
    By default, it's unlimited, for backward compatibility.
    
    In local mode, the driver and executor share the same JVM, and the default 
JVM heap size is too small, we can not set a proper limit for it as default, or 
it can not protect JVM from OOM, or it's too small for normal usage.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark collect

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3003.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3003
    
----
commit ca8267d2b0fbbf7ebcf59e0f50862ed10433e9d4
Author: Davies Liu <[email protected]>
Date:   2014-10-29T20:58:25Z

    limit the size of data by collect

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

Reply via email to