GitHub user chutium opened a pull request:
https://github.com/apache/spark/pull/2039
[SPARK-3131][SQL] Allow user to set parquet compression codec for writing
ParquetFile in SQLContext
There are 4 different compression codec available for
```ParquetOutputFormat```
currently it was set as a hard-coded value in
```ParquetRelation.defaultCompression```
original discuss:
https://github.com/apache/spark/pull/195#discussion-diff-11002083
i added a new config property in SQLConf to allow user to change this
compression codec, and i used similar short names syntax as described in
SPARK-2953 (https://github.com/apache/spark/pull/1873/files#diff-0)
btw, which codec should we use as default? it was set to GZIP
(https://github.com/apache/spark/pull/195/files#diff-4), but i think maybe we
should change this to SNAPPY, since SNAPPY is already the default codec for
shuffling in spark-core (SPARK-2469,
https://github.com/apache/spark/pull/1415/files#diff-0), and parquet-mr
supports Snappy codec natively
(https://github.com/Parquet/parquet-mr/commit/e440108de57199c12d66801ca93804086e7f7632).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/chutium/spark parquet-compression
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/2039.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2039
----
commit 21235dc68b5e5d935a506ffbe538baadb844ffe5
Author: chutium <[email protected]>
Date: 2014-08-19T19:42:34Z
[SPARK-3131][SQL] Allow user to set parquet compression codec for writing
ParquetFile in SQLContext
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]