Bruce Robbins created SPARK-38133: ------------------------------------- Summary: Grouping by timestamp_ntz will sometimes corrupt the results Key: SPARK-38133 URL: https://issues.apache.org/jira/browse/SPARK-38133 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Reporter: Bruce Robbins
Assume this data: {noformat} create or replace temp view v1 as select * from values (1, timestamp_ntz'2012-01-01 00:00:00', 10000), (2, timestamp_ntz'2012-01-01 00:00:00', 20000), (1, timestamp_ntz'2012-01-01 00:00:00', 5000), (1, timestamp_ntz'2013-01-01 00:00:00', 48000), (2, timestamp_ntz'2013-01-01 00:00:00', 30000) as data(a, b, c); {noformat} Run the following query: {noformat} select * from v1 pivot ( sum(c) for a in (1, 2) ); {noformat} You get incorrect results for the group-by column: {noformat} 2012-01-01 19:05:19.476736 15000 20000 2013-01-01 19:05:19.476736 48000 30000 Time taken: 2.65 seconds, Fetched 2 row(s) {noformat} Actually, _whenever_ the TungstenAggregationIterator is used to group by a timestamp_ntz column, you get incorrect results: {noformat} set spark.sql.codegen.wholeStage=false; select a, b, sum(c) from v1 group by a, b; {noformat} This query produces {noformat} 2 2012-01-01 09:32:39.738368 20000 1 2013-01-01 09:32:39.738368 48000 2 2013-01-01 09:32:39.738368 30000 Time taken: 1.927 seconds, Fetched 4 row(s) {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org