Github user JoshRosen commented on the issue:
https://github.com/apache/spark/pull/13652
I think a lot of my own confusion here stems from trying to reason through
what should happen if I take `java.sql.Date.valueOf("2016-01-01")` and convert
it into Spark's internal date representation. `Date.valueOf` appears to be
time-zone sensitive in the sense that the date's internal timestamp will be
different depending on the JVM's local timezone, so the result of
`java.sql.Date.valueOf("2016-01-01").getTime()` will vary from JVM to JVM.
When we convert a `java.sql.Date` into an `Int` to be stored in an
UnsafeRow we call `DateTimeUtils.fromJavaDate`, which is implemented as
`millisToDays(date.getTime)`. The value passed to `millisToDays` is a UTC
timestamp which represents the start of that day in the user's local timezone,
so I'm a bit confused about why the first step of `millisToDays` adds
`threadLocalLocalTimeZone.get().getOffset(millisUtc)` before truncating
computing the day offset. Why aren't we subtracting the offset in order to
normalize the time back to the start of the day in UTC?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]