Stamatis Zampetakis created HIVE-27199:
------------------------------------------

             Summary: Read TIMESTAMP WITH LOCAL TIME ZONE columns from text 
files using custom formats
                 Key: HIVE-27199
                 URL: https://issues.apache.org/jira/browse/HIVE-27199
             Project: Hive
          Issue Type: Improvement
          Components: Serializers/Deserializers
    Affects Versions: 4.0.0-alpha-2
            Reporter: Stamatis Zampetakis
            Assignee: Stamatis Zampetakis


Timestamp values come in many flavors and formats and there is no single 
representation that can satisfy everyone especially when such values are stored 
in plain text/csv files.

HIVE-9298, added a special SERDE property, {{{}timestamp.formats{}}}, that 
allows to provide custom timestamp patterns to parse correctly TIMESTAMP values 
coming from files.

However, when the column type is TIMESTAMP WITH LOCAL TIME ZONE (LTZ) it is not 
possible to use a custom pattern thus when the built-in Hive parser does not 
match the expected format a NULL value is returned.

Consider a text file, F1, with the following values:
{noformat}
2016-05-03 12:26:34
2016-05-03T12:26:34
{noformat}
and a table with a column declared as LTZ.
{code:sql}
CREATE TABLE ts_table (ts TIMESTAMP WITH LOCAL TIME ZONE);
LOAD DATA LOCAL INPATH './F1' INTO TABLE ts_table;

SELECT * FROM ts_table;
2016-05-03 12:26:34.0 US/Pacific
NULL
{code}
In order to give more flexibility to the users relying on the TIMESTAMP WITH 
LOCAL TIME ZONE datatype and also align the behavior with the TIMESTAMP type 
this JIRA aims to reuse the {{timestamp.formats}} property for both TIMESTAMP 
types.

The work here focuses exclusively on simple text files but the same could be 
done for other SERDE such as JSON etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to