GitHub user ypcat opened a pull request:
https://github.com/apache/spark/pull/5525
[SPARK-6352] [SQL] Custom parquet output committer
Add new config "spark.sql.parquet.output.committer.class" to allow custom
parquet output committer and an output committer class specific to use on s3.
Fix compilation error introduced by
https://github.com/apache/spark/pull/5042.
Respect ParquetOutputFormat.ENABLE_JOB_SUMMARY flag.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ypcat/spark spark-6352
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/5525.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5525
----
commit f75e261c5652e4d6fa69e6f790f5b4a9238ad29e
Author: Pei-Lun Lee <[email protected]>
Date: 2015-03-16T06:44:00Z
DirectParquetOutputCommitter
commit 769bd6737acc3f13a8678688e842e902fc38e802
Author: Pei-Lun Lee <[email protected]>
Date: 2015-03-13T03:48:03Z
DirectParquetOutputCommitter
commit 0fc03ca563c38ae0b625898c59730cdedfb0534b
Author: Pei-Lun Lee <[email protected]>
Date: 2015-03-17T07:27:44Z
[SPARK-6532] [SQL] hide class DirectParquetOutputCommitter
commit c42468c9b207edd995524afc2ebb1f723e375d20
Author: Pei-Lun Lee <[email protected]>
Date: 2015-03-17T07:28:28Z
[SPARK-6352] [SQL] add test case
commit 0d540b9cfc03fc71d228616123f8cad4602e8f14
Author: Pei-Lun Lee <[email protected]>
Date: 2015-03-17T07:56:02Z
[SPARK-6352] [SQL] add license
commit 9ae7545701f522702f2d0240367fc6fba06b7c26
Author: Pei-Lun Lee <[email protected]>
Date: 2015-03-23T10:32:15Z
[SPARL-6352] [SQL] Change to allow custom parquet output committer.
Add a new configuration key: spark.sql.parquet.output.committer.class
which should be a sub-class of ParquetOutputCommitter
commit e17bf474ab3db112958cf67318d13305bde60788
Author: Pei-Lun Lee <[email protected]>
Date: 2015-03-23T10:37:55Z
Merge branch 'master' of https://github.com/apache/spark into spark-6352
Conflicts:
sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetIOSuite.scala
commit fe659151c7f8e2547404fe8a93c6010ceebb865a
Author: Pei-Lun Lee <[email protected]>
Date: 2015-04-14T08:22:07Z
add support for parquet config parquet.enable.summary-metadata
commit 8413fcd7ec11d2c1892283643f89bd1bf0bbe062
Author: Pei-Lun Lee <[email protected]>
Date: 2015-04-14T08:31:17Z
Merge branch 'master' of https://github.com/apache/spark into spark-6352
Conflicts:
sql/core/src/main/scala/org/apache/spark/sql/parquet/DirectParquetOutputCommitter.scala
commit 9ece5c5cb366ba34fd542fe207dcdd6564385448
Author: Pei-Lun Lee <[email protected]>
Date: 2015-04-15T06:10:28Z
compatibility with hadoop 1.x
commit ddd0f69258a3d61552fbebe3fafeffd364fca322
Author: Pei-Lun Lee <[email protected]>
Date: 2015-04-15T09:45:54Z
Merge branch 'master' of https://github.com/apache/spark into spark-6352
Conflicts:
sql/core/src/main/scala/org/apache/spark/sql/parquet/DirectParquetOutputCommitter.scala
sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetIOSuite.scala
commit 472870e290ffb9e264889d72d6275e7abc6c231f
Author: Pei-Lun Lee <[email protected]>
Date: 2015-04-15T10:19:02Z
add back custom parquet output committer
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]