MaxGekk opened a new pull request #28809:
URL: https://github.com/apache/spark/pull/28809


   ### What changes were proposed in this pull request?
   Fix the bug in microseconds rebasing during transitions from one standard 
time zone offset to another one. In the PR, I propose to change the 
implementation of `rebaseGregorianToJulianMicros` which performs rebasing via 
local timestamps. In the case of overlapping:
   1. Check that the original instant belongs to earlier or later instant of 
overlapped local timestamp.
   2. If it is an earlier instant, take zone and DST offsets from the previous 
day otherwise
   3. Set time zone offsets to Julian timestamp from the next day.
   
   Note: The fix assumes that transitions cannot happen more often than once 
per 2 days.
   
   ### Why are the changes needed?
   Current implementation handles timestamps overlapping only during daylight 
saving time but overlapping can happen also during transition from one standard 
time zone to another one. For example in the case of `Asia/Hong_Kong`, the time 
zone switched from `Japan Standard Time` (UTC+9) to `Hong Kong Time` (UTC+8) on 
_Sunday, 18 November, 1945 01:59:59 AM_. The changes allow to handle the 
special case as well.
   
   ### Does this PR introduce _any_ user-facing change?
   It might affect micros rebasing in before common era when not-optimised 
version of `rebaseGregorianToJulianMicros()` is used directly.
   
   ### How was this patch tested?
   1. By existing tests in `DateTimeUtilsSuite`, `RebaseDateTimeSuite`, 
`DateFunctionsSuite`, `DateExpressionsSuite` and `TimestampFormatterSuite`.
   2. Added new test to `RebaseDateTimeSuite`
   3. Regenerated `gregorian-julian-rebase-micros.json` with the step of 30 
minutes, and got the same JSON file. The JSON file isn't affected because 
previously it was generated with the step of 1 week. And the spike in 
diffs/switch points during 1 hour of timestamp overlapping wasn't detected.
   
   Authored-by: Max Gekk <max.g...@gmail.com>
   Signed-off-by: Wenchen Fan <wenc...@databricks.com>
   (cherry picked from commit c259844df8b6690b752a1c67b241de2981cdb5fe)
   Signed-off-by: Max Gekk <max.g...@gmail.com>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to