MaxGekk commented on a change in pull request #23662:
[SPARK-26740][SPARK-26654][SQL] Make statistics of timestamp/date columns
independent from system time zones
URL: https://github.com/apache/spark/pull/23662#discussion_r252965646
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
##########
@@ -472,14 +474,24 @@ object CatalogColumnStat extends Logging {
private val KEY_MAX_LEN = "maxLen"
private val KEY_HISTOGRAM = "histogram"
+ val VERSION = 2
Review comment:
Just to be clear, with the changes:
- Spark 3.0 will be able to restore stats written by Spark 3.0 and by
previous versions
- Spark 2.4 and previous versions can parse stats written by Spark 3.0
(since stats with higher versions are not rejected, and format of timestamps is
acceptable by `Timestamp.valueOf`) but correct conversion to internal values is
possible only when system time is set to `UTC`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]