GitHub user hvanhovell opened a pull request:

    https://github.com/apache/spark/pull/16804

    [SPARK-19459][SQL] Add Hive datatype (char/varchar) to StructField metadata

    ## What changes were proposed in this pull request?
    Reading from an existing ORC table which contains `char` or `varchar` 
columns can fail with a `ClassCastException` if the table metadata has been 
created using Spark. This is caused by the fact that spark internally replaces 
`char` and `varchar` columns with a `string` column.
    
    This PR fixes this by adding the hive type to the `StructField's` metadata 
under the `HIVE_TYPE_STRING` key. This is picked up by the `HiveClient` and the 
ORC reader, see https://github.com/apache/spark/pull/16060 for more details on 
how the metadata is used.
    
    ## How was this patch tested?
    Added a regression test to `OrcSourceSuite`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hvanhovell/spark SPARK-19459

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16804.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16804
    
----
commit c6a5bf60b9d92dde5be2d0b60af42acf92095aa4
Author: Herman van Hovell <[email protected]>
Date:   2017-02-04T10:58:54Z

    Add Hive datatype (char/varchar) to struct field metadata. This fixes 
issues with char/varchar columns in ORC.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to