bersprockets commented on PR #36546: URL: https://github.com/apache/spark/pull/36546#issuecomment-1126602057
This PR brings `Date` in line with `Timestamp` (that is, time-zone aware). But even `Timestamp` sequences have some anomalies, e.g. (from a Spark without my change, in the America/Los_Angeles time-zone): ``` spark-sql> select element_at(sequence(timestamp'2021-01-01', timestamp'2021-01-01' + interval 82 hours * 97, interval 82 hours), 97) as a; 2021-11-24 23:00:00 Time taken: 0.076 seconds, Fetched 1 row(s) spark-sql> select timestamp'2021-01-01' + interval 82 hours * 96 as x; 2021-11-25 00:00:00 Time taken: 0.053 seconds, Fetched 1 row(s) spark-sql> ``` The 96th (origin 0) element of the sequence from the first query is 1 hour less than the result of the second query. One would think they should be the same (both supposedly being `'2021-01-01' + interval 82 hours * 96 `), but the "fall back" is being handled differently around element 92 (origin 0) of the sequence. `Date` sequences also have (and will continue to have, after this PR) the same anomaly: ``` spark-sql> select date'2021-01-01' + interval 82 hours * 96 as x; 2021-11-25 00:00:00 Time taken: 4.146 seconds, Fetched 1 row(s) spark-sql> select element_at(sequence(date'2021-01-01', date'2022-01-05', interval 82 hours), 97) as a; 2021-11-24 Time taken: 0.125 seconds, Fetched 1 row(s) spark-sql> ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
