GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/16060
[SPARK-17897][SQL] read Hive orc table with varchar column should not fail
## What changes were proposed in this pull request?
Spark SQL only has `StringType`, when reading hive table with varchar
column, we will read that column as `StringType`. However, we still need to use
varchar `ObjectInspector` to read varchar column in hive table, which means we
need to know the actual column type at hive side.
In Spark 2.1, after https://github.com/apache/spark/pull/14363 , we parse
hive type string to catalyst type, which means the actual column type at hive
side is erased. Then we may use string `ObjectInspector` to read varchar column
and fail.
This PR keeps the original hive column type string in the metadata of
`StructField`, and use it when we convert it to a hive column.
## How was this patch tested?
newly added regression test
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloud-fan/spark varchar
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/16060.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #16060
----
commit 71c9deab0c76af80551faf9533dc5e3dff76e488
Author: Wenchen Fan <[email protected]>
Date: 2016-11-29T11:11:02Z
read Hive orc table with varchar column
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]