LinhongLiu edited a comment on issue #26134: [SPARK-29486][SQL] CalendarInterval should have 3 fields: months, days and microseconds URL: https://github.com/apache/spark/pull/26134#issuecomment-542659522 @MaxGekk @cloud-fan thanks for your review, here is my answer for the questions: > local timestamp 2019-11-03 02:00:00 will be showed on local clock twice my solution won't fix this, but the original code won't, too. actually I don't think this matters. > could you show concrete example when old (current) implementation returns wrong results, and your changes fix that the first case is adding `1 day` on `2019-11-02 12:00:00 PST` in current code will result in `2019-11-03 11:00:00 PST`. But as you said, this won't produce a wrong results. the second case is in SS, if we do a window agg like this: ``` df.groupBy(window($"time", "1 day")).sum ``` Since this function is considering timezone, in current implementation, we may have window like this: `2019-11-01 00:00:00` - `2019-11-01 23:59:59` `2019-11-02 00:00:00` - `2019-11-02 23:59:59` `2019-11-03 00:00:00` - `2019-11-03 22:59:59` `2019-11-03 23:00:00` - `2019-11-04 22:59:59` I think the user don't want to see aggregation in this kind of window.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
