GitHub user krisgeus opened a pull request:
https://github.com/apache/spark/pull/21893
Support selecting from partitioned tabels with partitions having different
data formats
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
Before these changes spark failed when selecting form a hive partitioned
table with partitions with different data formats. Hive returned the correct
results.
We changed Spark to support the following SQL also:
ALTER TABLE <table> SET FILEFORMAT PARQUET
and
ALTER TABLE <table> PARTITON <partitionspec> SET FILEFORMAT PARQUET
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise,
remove this)
Unittests are available in:
org.apache.spark.sql.hive.execution.MultiFormatTableSuite
Please review http://spark.apache.org/contributing.html before opening a
pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/krisgeus/spark pr-multiformat-partitions
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21893.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21893
----
commit 2498b95960c2dee18072cdb7ecd87d2ce6b949f4
Author: Kris Geusebroek <krisgeus@...>
Date: 2018-07-27T12:57:43Z
Support alter table/partition file format
commit 393416991562cf73d5304cc39c727cad53b2e610
Author: Kris Geusebroek <krisgeus@...>
Date: 2018-07-27T13:22:01Z
Support selecting from partitioned tables with different data formats
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]