[GitHub] spark pull request #16060: [SPARK-17897][SQL] read Hive orc table with varch...

cloud-fan Tue, 29 Nov 2016 03:37:46 -0800

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/16060


    [SPARK-17897][SQL] read Hive orc table with varchar column should not fail

    ## What changes were proposed in this pull request?
    
    Spark SQL only has `StringType`, when reading hive table with varchar 
column, we will read that column as `StringType`. However, we still need to use 
varchar `ObjectInspector` to read varchar column in hive table, which means we 
need to know the actual column type at hive side.
    
    In Spark 2.1, after https://github.com/apache/spark/pull/14363 , we parse 
hive type string to catalyst type, which means the actual column type at hive 
side is erased. Then we may use string `ObjectInspector` to read varchar column 
and fail.
    
    This PR keeps the original hive column type string in the metadata of 
`StructField`, and use it when we convert it to a hive column.
    
    ## How was this patch tested?
    
    newly added regression test


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark varchar

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16060.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16060
    
----
commit 71c9deab0c76af80551faf9533dc5e3dff76e488
Author: Wenchen Fan <[email protected]>
Date:   2016-11-29T11:11:02Z

    read Hive orc table with varchar column

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #16060: [SPARK-17897][SQL] read Hive orc table with varch...

Reply via email to