[GitHub] spark pull request: [SQL] More aggressive defaults

marmbrus Sun, 02 Nov 2014 16:14:53 -0800

GitHub user marmbrus opened a pull request:

    https://github.com/apache/spark/pull/3064


    [SQL] More aggressive defaults

     - Turns on compression for in-memory cached data by default
     - Changes the default parquet compression format back to gzip (we have 
seen more OOMs with production workloads due to the way Snappy allocates memory)
     - Ups the batch size to 10,000 rows
     - Increases the broadcast threshold to 10mb.
     - Uses our parquet implementation instead of the hive one by default.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/marmbrus/spark fasterDefaults

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3064.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3064
    
----
commit da373f9d619fe5c8b71cb21c32d8e984872cc572
Author: Michael Armbrust <[email protected]>
Date:   2014-11-03T00:07:05Z

    More aggressive defaults

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SQL] More aggressive defaults

Reply via email to