GitHub user dongjoon-hyun opened a pull request:
https://github.com/apache/spark/pull/19651
[SPARK-20682][SPARK-15474][SPARK-21791] Add new ORCFileFormat based on ORC
1.4.1
## What changes were proposed in this pull request?
Since [SPARK-2883](https://issues.apache.org/jira/browse/SPARK-2883),
Apache Spark supports Apache ORC inside `sql/hive` module with Hive dependency.
This PR aims to add a new ORC data source inside `sql/core` and to replace the
old ORC data source eventually. This PR resolves the following three issues.
- SPARK-20682: Add new ORCFileFormat based on Apache ORC 1.4.1
- SPARK-15474: ORC data source fails to write and read back empty dataframe
- SPARK-21791: ORC should support column names with dot
## How was this patch tested?
Pass the Jenkins with the existing all tests and new tests for SPARK-15474
and SPARK-21791.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dongjoon-hyun/spark SPARK-20682
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19651.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19651
----
commit fdde27416fe54036afbd9b809a363e7871df67cf
Author: Dongjoon Hyun <[email protected]>
Date: 2017-05-15T02:33:15Z
[SPARK-20682][SPARK-15474][SPARK-21791] Add new ORCFileFormat based on
Apache ORC 1.4.1
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]