srowen commented on a change in pull request #24181: [SPARK-27242][SQL] Make
formatting TIMESTAMP/DATE literals independent from the default time zone
URL: https://github.com/apache/spark/pull/24181#discussion_r269074061
##########
File path: docs/sql-migration-guide-upgrade.md
##########
@@ -96,13 +96,17 @@ displayTitle: Spark SQL Upgrading Guide
- The `weekofyear`, `weekday`, `dayofweek`, `date_trunc`,
`from_utc_timestamp`, `to_utc_timestamp`, and `unix_timestamp` functions use
java.time API for calculation week number of year, day number of week as well
for conversion from/to TimestampType values in UTC time zone.
- the JDBC options `lowerBound` and `upperBound` are converted to
TimestampType/DateType values in the same way as casting strings to
TimestampType/DateType values. The conversion is based on Proleptic Gregorian
calendar, and time zone defined by the SQL config `spark.sql.session.timeZone`.
In Spark version 2.4 and earlier, the conversion is based on the hybrid
calendar (Julian + Gregorian) and on default system time zone.
+
+ - Formatting of `TIMESTAMP` and `DATE` literals.
- In Spark version 2.4 and earlier, invalid time zone ids are silently
ignored and replaced by GMT time zone, for example, in the from_utc_timestamp
function. Since Spark 3.0, such time zone ids are rejected, and Spark throws
`java.time.DateTimeException`.
- In Spark version 2.4 and earlier, the `current_timestamp` function returns
a timestamp with millisecond resolution only. Since Spark 3.0, the function can
return the result with microsecond resolution if the underlying clock available
on the system offers such resolution.
- In Spark version 2.4 abd earlier, when reading a Hive Serde table with
Spark native data sources(parquet/orc), Spark will infer the actual file schema
and update the table schema in metastore. Since Spark 3.0, Spark doesn't infer
the schema anymore. This should not cause any problems to end users, but if it
does, please set `spark.sql.hive.caseSensitiveInferenceMode` to
`INFER_AND_SAVE`.
+ - Since Spark 3.0, `TIMESTAMP` literals are converted to strings using the
SQL config `spark.sql.session.timeZone`, and `DATE` literals are formatted
using the UTC time zone. In Spark version 2.4 and earlier, both conversions use
the default time zone of the Java virtual machine.
Review comment:
I think this relates to our other conversation about how to render dates.
Why not render DATE w.r.t. the session timezone if TIMESTAMP is too? The way we
represent it, it's also a 'timestamp' of sorts, which is itself a problem, I
think, unless I'm really misunderstanding the actual intended semantics of a
'date' by itself.
Forget 'current_date'; if I parse "2019-03-26' to a DateType and save it,
and then render it again now, I may not get "2019-03-26" after this change,
right? that seems problematic. Of course all bets are off if my session
timezone has changed, and that's it's own problem, but is this better?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]