[GitHub] [spark] srowen commented on a change in pull request #24181: [SPARK-27242][SQL] Make formatting TIMESTAMP/DATE literals independent from the default time zone

GitBox Tue, 26 Mar 2019 05:21:46 -0700

srowen commented on a change in pull request #24181: [SPARK-27242][SQL] Make 
formatting TIMESTAMP/DATE literals independent from the default time zone
URL: https://github.com/apache/spark/pull/24181#discussion_r269074061


 ##########
 File path: docs/sql-migration-guide-upgrade.md
 ##########
 @@ -96,13 +96,17 @@ displayTitle: Spark SQL Upgrading Guide
     - The `weekofyear`, `weekday`, `dayofweek`, `date_trunc`, 
`from_utc_timestamp`, `to_utc_timestamp`, and `unix_timestamp` functions use 
java.time API for calculation week number of year, day number of week as well 
for conversion from/to TimestampType values in UTC time zone.
 
     - the JDBC options `lowerBound` and `upperBound` are converted to 
TimestampType/DateType values in the same way as casting strings to 
TimestampType/DateType values. The conversion is based on Proleptic Gregorian 
calendar, and time zone defined by the SQL config `spark.sql.session.timeZone`. 
In Spark version 2.4 and earlier, the conversion is based on the hybrid 
calendar (Julian + Gregorian) and on default system time zone.
+    
+    - Formatting of `TIMESTAMP` and `DATE` literals.
 
   - In Spark version 2.4 and earlier, invalid time zone ids are silently 
ignored and replaced by GMT time zone, for example, in the from_utc_timestamp 
function. Since Spark 3.0, such time zone ids are rejected, and Spark throws 
`java.time.DateTimeException`.
 
   - In Spark version 2.4 and earlier, the `current_timestamp` function returns 
a timestamp with millisecond resolution only. Since Spark 3.0, the function can 
return the result with microsecond resolution if the underlying clock available 
on the system offers such resolution.
 
   - In Spark version 2.4 abd earlier, when reading a Hive Serde table with 
Spark native data sources(parquet/orc), Spark will infer the actual file schema 
and update the table schema in metastore. Since Spark 3.0, Spark doesn't infer 
the schema anymore. This should not cause any problems to end users, but if it 
does, please set `spark.sql.hive.caseSensitiveInferenceMode` to 
`INFER_AND_SAVE`.
 
+  - Since Spark 3.0, `TIMESTAMP` literals are converted to strings using the 
SQL config `spark.sql.session.timeZone`, and `DATE` literals are formatted 
using the UTC time zone. In Spark version 2.4 and earlier, both conversions use 
the default time zone of the Java virtual machine.
 
 Review comment:
   I think this relates to our other conversation about how to render dates. 
Why not render DATE w.r.t. the session timezone if TIMESTAMP is too? The way we 
represent it, it's also a 'timestamp' of sorts, which is itself a problem, I 
think, unless I'm really misunderstanding the actual intended semantics of a 
'date' by itself.
   
   Forget 'current_date'; if I parse "2019-03-26' to a DateType and save it, 
and then render it again now, I may not get "2019-03-26" after this change, 
right? that seems problematic. Of course all bets are off if my session 
timezone has changed, and that's it's own problem, but is this better?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] srowen commented on a change in pull request #24181: [SPARK-27242][SQL] Make formatting TIMESTAMP/DATE literals independent from the default time zone

Reply via email to