[jira] [Commented] (HIVE-10488) cast DATE as TIMESTAMP returns incorrect values
[ https://issues.apache.org/jira/browse/HIVE-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521089#comment-14521089 ] Alexander Pivovarov commented on HIVE-10488: looks like Orc table contains int values instead of date values e.g. 1996-01-01=1994 I got very similar results as in the description when I removed quotation marks wrapping date values: {code} -- 1996-01-01=1994 hive select cast(1996-01-01 as timestamp); OK 1969-12-31 16:00:01.994 -- 2000-01-01=1998 hive select cast(2000-01-01 as timestamp); OK 1969-12-31 16:00:01.998 -- 2000-12-31=1957 hive select cast(2000-12-31 as timestamp); OK 1969-12-31 16:00:01.957 {code} my TimeZone is US/Pacific - this is why time is -8hr from 1970 (1969 4pm) The description shows 1969 7pm (-5 hours offset from 1970) - So, their timezone is US/Eastern cast DATE as TIMESTAMP returns incorrect values --- Key: HIVE-10488 URL: https://issues.apache.org/jira/browse/HIVE-10488 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.13.1 Reporter: N Campbell Assignee: Chaoyu Tang same data in textfile works same data loaded into an ORC table does not connection property of tez/mr makes no difference. select rnum, cdt, cast (cdt as timestamp) from tdt 0 null null 1 1996-01-01 1969-12-31 19:00:09.496 2 2000-01-01 1969-12-31 19:00:10.957 3 2000-12-31 1969-12-31 19:00:11.322 vs 0 null null 1 1996-01-01 1996-01-01 00:00:00.0 2 2000-01-01 2000-01-01 00:00:00.0 3 2000-12-31 2000-12-31 00:00:00.0 create table if not exists TDT ( RNUM int , CDT date ) STORED AS orc ; insert overwrite table TDT select * from text.TDT; 0|\N 1|1996-01-01 2|2000-01-01 3|2000-12-31 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10488) cast DATE as TIMESTAMP returns incorrect values
[ https://issues.apache.org/jira/browse/HIVE-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519997#comment-14519997 ] Alexander Pivovarov commented on HIVE-10488: Got it! Thank you. BTW, I remember it was one issue with date to timestamp conversion for negative unix time HIVE-10178. But it is fixed already and has nothing to do with Orc. {code} select cast(cast('1966-01-01 00:00:01' as timestamp) as date); 1966-02-02 {code} cast DATE as TIMESTAMP returns incorrect values --- Key: HIVE-10488 URL: https://issues.apache.org/jira/browse/HIVE-10488 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.13.1 Reporter: N Campbell Assignee: Chaoyu Tang same data in textfile works same data loaded into an ORC table does not connection property of tez/mr makes no difference. select rnum, cdt, cast (cdt as timestamp) from tdt 0 null null 1 1996-01-01 1969-12-31 19:00:09.496 2 2000-01-01 1969-12-31 19:00:10.957 3 2000-12-31 1969-12-31 19:00:11.322 vs 0 null null 1 1996-01-01 1996-01-01 00:00:00.0 2 2000-01-01 2000-01-01 00:00:00.0 3 2000-12-31 2000-12-31 00:00:00.0 create table if not exists TDT ( RNUM int , CDT date ) STORED AS orc ; insert overwrite table TDT select * from text.TDT; 0|\N 1|1996-01-01 2|2000-01-01 3|2000-12-31 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10488) cast DATE as TIMESTAMP returns incorrect values
[ https://issues.apache.org/jira/browse/HIVE-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519846#comment-14519846 ] Chaoyu Tang commented on HIVE-10488: [~the6campbells] I was also not able to reproduce the issue in Hive 1.2. Here are my steps: {code} create table testcastts (key int, datevalue date); insert into testcastts select 0, null from src limit 1; insert into testcastts select 1, date '1996-01-01' from src limit 1; insert into testcastts select 2, date '2000-01-01' from src limit 1; insert into testcastts select 3, date '2000-12-31' from src limit 1; --- select key, datevalue, cast(datevalue as timestamp) from testcastts; 0 NULLNULL 1 1996-01-01 1996-01-01 00:00:00 2 2000-01-01 2000-01-01 00:00:00 3 2000-12-31 2000-12-31 00:00:00 --- create table if not exists testcastorcts (key int, datevalue date) stored as orc; insert overwrite table testcastorcts select * from testcastts; select key, datevalue, cast(datevalue as timestamp) from testcastorcts; 0 NULLNULL 1 1996-01-01 1996-01-01 00:00:00 2 2000-01-01 2000-01-01 00:00:00 3 2000-12-31 2000-12-31 00:00:00 {code} Do you see any difference between above my test case and yours? Otherwise, I will resolve this JIRA as Not Reproducible. Thanks cast DATE as TIMESTAMP returns incorrect values --- Key: HIVE-10488 URL: https://issues.apache.org/jira/browse/HIVE-10488 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.13.1 Reporter: N Campbell Assignee: Chaoyu Tang same data in textfile works same data loaded into an ORC table does not connection property of tez/mr makes no difference. select rnum, cdt, cast (cdt as timestamp) from tdt 0 null null 1 1996-01-01 1969-12-31 19:00:09.496 2 2000-01-01 1969-12-31 19:00:10.957 3 2000-12-31 1969-12-31 19:00:11.322 vs 0 null null 1 1996-01-01 1996-01-01 00:00:00.0 2 2000-01-01 2000-01-01 00:00:00.0 3 2000-12-31 2000-12-31 00:00:00.0 create table if not exists TDT ( RNUM int , CDT date ) STORED AS orc ; insert overwrite table TDT select * from text.TDT; 0|\N 1|1996-01-01 2|2000-01-01 3|2000-12-31 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10488) cast DATE as TIMESTAMP returns incorrect values
[ https://issues.apache.org/jira/browse/HIVE-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519865#comment-14519865 ] Alexander Pivovarov commented on HIVE-10488: you do not have to put from src limit 1 in select statement in hive-1.2 cast DATE as TIMESTAMP returns incorrect values --- Key: HIVE-10488 URL: https://issues.apache.org/jira/browse/HIVE-10488 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.13.1 Reporter: N Campbell Assignee: Chaoyu Tang same data in textfile works same data loaded into an ORC table does not connection property of tez/mr makes no difference. select rnum, cdt, cast (cdt as timestamp) from tdt 0 null null 1 1996-01-01 1969-12-31 19:00:09.496 2 2000-01-01 1969-12-31 19:00:10.957 3 2000-12-31 1969-12-31 19:00:11.322 vs 0 null null 1 1996-01-01 1996-01-01 00:00:00.0 2 2000-01-01 2000-01-01 00:00:00.0 3 2000-12-31 2000-12-31 00:00:00.0 create table if not exists TDT ( RNUM int , CDT date ) STORED AS orc ; insert overwrite table TDT select * from text.TDT; 0|\N 1|1996-01-01 2|2000-01-01 3|2000-12-31 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10488) cast DATE as TIMESTAMP returns incorrect values
[ https://issues.apache.org/jira/browse/HIVE-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519863#comment-14519863 ] Alexander Pivovarov commented on HIVE-10488: Chaoyu, testcastts should be Orc table. Is Orc default table format in your hive config? Cast is working fine for textfile table for N. cast DATE as TIMESTAMP returns incorrect values --- Key: HIVE-10488 URL: https://issues.apache.org/jira/browse/HIVE-10488 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.13.1 Reporter: N Campbell Assignee: Chaoyu Tang same data in textfile works same data loaded into an ORC table does not connection property of tez/mr makes no difference. select rnum, cdt, cast (cdt as timestamp) from tdt 0 null null 1 1996-01-01 1969-12-31 19:00:09.496 2 2000-01-01 1969-12-31 19:00:10.957 3 2000-12-31 1969-12-31 19:00:11.322 vs 0 null null 1 1996-01-01 1996-01-01 00:00:00.0 2 2000-01-01 2000-01-01 00:00:00.0 3 2000-12-31 2000-12-31 00:00:00.0 create table if not exists TDT ( RNUM int , CDT date ) STORED AS orc ; insert overwrite table TDT select * from text.TDT; 0|\N 1|1996-01-01 2|2000-01-01 3|2000-12-31 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10488) cast DATE as TIMESTAMP returns incorrect values
[ https://issues.apache.org/jira/browse/HIVE-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519921#comment-14519921 ] Chaoyu Tang commented on HIVE-10488: Here are the desc formatted from two tables (testcastts and testcastorcts) I tested: {code} # Detailed Table Information Database: jira Owner: ctang CreateTime: Wed Apr 29 12:03:08 EDT 2015 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: file:/user/hive/warehouse/apache/jira.db/testcastts Table Type: MANAGED_TABLE Table Parameters: COLUMN_STATS_ACCURATE true numFiles4 numRows 4 rawDataSize 40 totalSize 44 transient_lastDdlTime 1430323769 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat:org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets:-1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format1 === key int datevalue date # Detailed Table Information Database: jira Owner: ctang CreateTime: Wed Apr 29 12:12:42 EDT 2015 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: file:/user/hive/warehouse/apache/jira.db/testcastorcts Table Type: MANAGED_TABLE Table Parameters: COLUMN_STATS_ACCURATE true numFiles1 numRows 4 rawDataSize 184 totalSize 304 transient_lastDdlTime 1430324019 # Storage Information SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Compressed: No Num Buckets:-1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format1 {code} BTW, I also queried with vectorized execution (set hive.vectorized.execution.enabled=true) for the ORC table testcastorcts, it also worked fine. cast DATE as TIMESTAMP returns incorrect values --- Key: HIVE-10488 URL: https://issues.apache.org/jira/browse/HIVE-10488 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.13.1 Reporter: N Campbell Assignee: Chaoyu Tang same data in textfile works same data loaded into an ORC table does not connection property of tez/mr makes no difference. select rnum, cdt, cast (cdt as timestamp) from tdt 0 null null 1 1996-01-01 1969-12-31 19:00:09.496 2 2000-01-01 1969-12-31 19:00:10.957 3 2000-12-31 1969-12-31 19:00:11.322 vs 0 null null 1 1996-01-01 1996-01-01 00:00:00.0 2 2000-01-01 2000-01-01 00:00:00.0 3 2000-12-31 2000-12-31 00:00:00.0 create table if not exists TDT ( RNUM int , CDT date ) STORED AS orc ; insert overwrite table TDT select * from text.TDT; 0|\N 1|1996-01-01 2|2000-01-01 3|2000-12-31 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10488) cast DATE as TIMESTAMP returns incorrect values
[ https://issues.apache.org/jira/browse/HIVE-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513641#comment-14513641 ] Alexander Pivovarov commented on HIVE-10488: I can not reproduce this issue in hive 1.2.0 I created 2 tables t3 - textfile t3o - Orc {code} hive desc formatted t3; OK # col_name data_type comment rnumint cdt date # Detailed Table Information Database: default Owner: apivovarov CreateTime: Sun Apr 26 23:58:29 PDT 2015 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://localhost/apps/apivovarov/warehouse/t3 Table Type: MANAGED_TABLE Table Parameters: transient_lastDdlTime 1430117909 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat:org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets:-1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format1 Time taken: 0.098 seconds, Fetched: 27 row(s) {code} {code} hive desc formatted t3o; OK # col_name data_type comment rnumint cdt date # Detailed Table Information Database: default Owner: apivovarov CreateTime: Mon Apr 27 00:00:11 PDT 2015 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://localhost/apps/apivovarov/warehouse/t3o Table Type: MANAGED_TABLE Table Parameters: COLUMN_STATS_ACCURATE true numFiles1 numRows 4 rawDataSize 184 totalSize 302 transient_lastDdlTime 1430118011 # Storage Information SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Compressed: No Num Buckets:-1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format1 Time taken: 0.096 seconds, Fetched: 32 row(s) {code} {code} hive select * from t3; OK 0 NULL 1 1996-01-01 2 2000-01-01 3 2000-12-31 Time taken: 0.086 seconds, Fetched: 4 row(s) {code} {code} hive select * from t3o; OK 0 NULL 1 1996-01-01 2 2000-01-01 3 2000-12-31 Time taken: 0.086 seconds, Fetched: 4 row(s) {code} {code} hive select rnum, cdt, cast (cdt as timestamp) from t3; OK 0 NULLNULL 1 1996-01-01 1996-01-01 00:00:00 2 2000-01-01 2000-01-01 00:00:00 3 2000-12-31 2000-12-31 00:00:00 Time taken: 0.091 seconds, Fetched: 4 row(s) {code} {code} hive select rnum, cdt, cast (cdt as timestamp) from t3o; OK 0 NULLNULL 1 1996-01-01 1996-01-01 00:00:00 2 2000-01-01 2000-01-01 00:00:00 3 2000-12-31 2000-12-31 00:00:00 Time taken: 0.108 seconds, Fetched: 4 row(s) {code} MR {code} hive select t3.rnum, t3.cdt, cast (t3.cdt as timestamp) cts, t3o.cdt cdt2, cast(t3o.cdt as timestamp) cts2 from t3 join t3o on (t3.rnum = t3o.rnum); Query ID = apivovarov_20150427000533_2734a9a1-63eb-45d4-83a4-4129ae3e7afc Total jobs = 1 15/04/27 00:05:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Execution log at: /tmp/apivovarov/apivovarov_20150427000533_2734a9a1-63eb-45d4-83a4-4129ae3e7afc.log 2015-04-27 00:05:37 Starting to launch local task to process map join;
[jira] [Commented] (HIVE-10488) cast DATE as TIMESTAMP returns incorrect values
[ https://issues.apache.org/jira/browse/HIVE-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514263#comment-14514263 ] N Campbell commented on HIVE-10488: --- Cast was using a column of type date and not literals or value expression. Variance was the format the data was in. cast DATE as TIMESTAMP returns incorrect values --- Key: HIVE-10488 URL: https://issues.apache.org/jira/browse/HIVE-10488 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.13.1 Reporter: N Campbell Assignee: Chaoyu Tang same data in textfile works same data loaded into an ORC table does not connection property of tez/mr makes no difference. select rnum, cdt, cast (cdt as timestamp) from tdt 0 null null 1 1996-01-01 1969-12-31 19:00:09.496 2 2000-01-01 1969-12-31 19:00:10.957 3 2000-12-31 1969-12-31 19:00:11.322 vs 0 null null 1 1996-01-01 1996-01-01 00:00:00.0 2 2000-01-01 2000-01-01 00:00:00.0 3 2000-12-31 2000-12-31 00:00:00.0 create table if not exists TDT ( RNUM int , CDT date ) STORED AS orc ; insert overwrite table TDT select * from text.TDT; 0|\N 1|1996-01-01 2|2000-01-01 3|2000-12-31 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10488) cast DATE as TIMESTAMP returns incorrect values
[ https://issues.apache.org/jira/browse/HIVE-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514489#comment-14514489 ] Alexander Pivovarov commented on HIVE-10488: Put 2015-01-02 to quotes in the example 3 Otherwise you cast integer 2012 to timestamp cast DATE as TIMESTAMP returns incorrect values --- Key: HIVE-10488 URL: https://issues.apache.org/jira/browse/HIVE-10488 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.13.1 Reporter: N Campbell Assignee: Chaoyu Tang same data in textfile works same data loaded into an ORC table does not connection property of tez/mr makes no difference. select rnum, cdt, cast (cdt as timestamp) from tdt 0 null null 1 1996-01-01 1969-12-31 19:00:09.496 2 2000-01-01 1969-12-31 19:00:10.957 3 2000-12-31 1969-12-31 19:00:11.322 vs 0 null null 1 1996-01-01 1996-01-01 00:00:00.0 2 2000-01-01 2000-01-01 00:00:00.0 3 2000-12-31 2000-12-31 00:00:00.0 create table if not exists TDT ( RNUM int , CDT date ) STORED AS orc ; insert overwrite table TDT select * from text.TDT; 0|\N 1|1996-01-01 2|2000-01-01 3|2000-12-31 -- This message was sent by Atlassian JIRA (v6.3.4#6332)