GitHub user AndreSchumacher opened a pull request:
https://github.com/apache/spark/pull/195
Spark parquet improvements
A few improvements to the Parquet support for SQL queries:
- Instead of files a ParquetRelation is now backed by a directory, which
simplifies importing data from other
sources
- InsertIntoParquetTable operation now supports switching between
overwriting or appending (at least in
HiveQL)
- tests now use the new API
- Parquet logging can be set to WARNING level (Default)
- Default compression for Parquet files (GZIP, as in parquet-mr)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/AndreSchumacher/spark
spark_parquet_improvements
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/195.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #195
----
commit 14a1d2c1a5df4475ffccaf3ce36a41f2234ec3b7
Author: Andre Schumacher <[email protected]>
Date: 2014-03-18T13:23:26Z
Changing ParquetRelation underlying data from file to dir
commit d6630d408cfadd6ef67767f4794f57b7ebe1c605
Author: Andre Schumacher <[email protected]>
Date: 2014-03-19T06:04:37Z
Optional overwrite when inserting into ParquetRelation
commit d1d3639d8c45f93dd485628cb02ab5ef1dccc93e
Author: Andre Schumacher <[email protected]>
Date: 2014-03-19T11:32:51Z
Update of Parquet tests to new API
commit 233e67f5571b9be38c04c0205ed522455cd91661
Author: Andre Schumacher <[email protected]>
Date: 2014-03-19T17:15:09Z
Implementing appending to existing ParquetRelation
commit b8abe01a54090191e89d49f0960096b0844b3fd7
Author: Andre Schumacher <[email protected]>
Date: 2014-03-14T19:09:15Z
Setting Parquet log level
commit 0a07f05e094e99317913c619c9ff08bc45cc933c
Author: Andre Schumacher <[email protected]>
Date: 2014-03-21T13:14:55Z
Adding Parquet debug parameter and default compression
commit 05ed2477718798fd373c9b478f25518bf8919381
Author: Andre Schumacher <[email protected]>
Date: 2014-03-21T15:26:12Z
Adding future example
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---