LinhongLiu edited a comment on issue #26134: [SPARK-29486][SQL] 
CalendarInterval should have 3 fields: months, days and microseconds
URL: https://github.com/apache/spark/pull/26134#issuecomment-542659522
 
 
   @MaxGekk @cloud-fan thanks for your review, here is my answer for the 
questions:
   
   > local timestamp 2019-11-03 02:00:00 will be showed on local clock twice
   
   my solution won't fix this, but the original code won't, too. actually I 
don't think this matters.
   
   
   > could you show concrete example when old (current) implementation returns 
wrong results, and your changes fix that
   
   the first case is adding `1 day` on `2019-11-02 12:00:00 PST` in current 
code will result in `2019-11-03 11:00:00 PST`. But as you said, this won't 
produce a wrong results.
   
   the second case is in SS, if we do a window agg like this:
   ```
   df.groupBy(window($"time", "1 day")).sum
   ```
   Since this function is considering timezone, in current implementation, we 
may have window like this:
   `2019-11-01 00:00:00` - `2019-11-01 23:59:59`
   `2019-11-02 00:00:00` - `2019-11-02 23:59:59`
   `2019-11-03 00:00:00` - `2019-11-03 22:59:59`
   `2019-11-03 23:00:00` - `2019-11-04 22:59:59`
   I think the user don't want to see aggregation in this kind of window.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to