HyukjinKwon commented on a change in pull request #23150: [SPARK-26178][SQL]
Use java.time API for parsing timestamps and dates from CSV
URL: https://github.com/apache/spark/pull/23150#discussion_r242021998
##########
File path: docs/sql-migration-guide-upgrade.md
##########
@@ -33,6 +33,8 @@ displayTitle: Spark SQL Upgrading Guide
- Spark applications which are built with Spark version 2.4 and prior, and
call methods of `UserDefinedFunction`, need to be re-compiled with Spark 3.0,
as they are not binary compatible with Spark 3.0.
+ - Since Spark 3.0, CSV datasource uses java.time API for parsing and
generating CSV content. New formatting implementation supports date/timestamp
patterns conformed to ISO 8601. To switch back to the implementation used in
Spark 2.4 and earlier, set `spark.sql.legacy.timeParser.enabled` to `true`.
Review comment:
@MaxGekk, can you check if this legacy configuration works or not?
I checked it as below:
```diff
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
index 2b8d22dde92..08795972fb7 100644
---
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
@@ -26,6 +26,7 @@ import scala.util.Try
import org.apache.commons.lang3.time.FastDateFormat
+import org.apache.spark.internal.Logging
import org.apache.spark.sql.internal.SQLConf
sealed trait TimestampFormatter {
@@ -112,11 +113,13 @@ class LegacyFallbackTimestampFormatter(
}
}
-object TimestampFormatter {
+object TimestampFormatter extends Logging {
def apply(format: String, timeZone: TimeZone, locale: Locale):
TimestampFormatter = {
if (SQLConf.get.legacyTimeParserEnabled) {
+ logError("LegacyFallbackTimestampFormatter is being used")
new LegacyFallbackTimestampFormatter(format, timeZone, locale)
} else {
+ logError("Iso8601TimestampFormatter is being used")
new Iso8601TimestampFormatter(format, timeZone, locale)
}
}
```
```bash
$ ./bin/spark-shell --conf spark.sql.legacy.timeParser.enabled=true
```
```scala
scala> spark.conf.get("spark.sql.legacy.timeParser.enabled")
res0: String = true
scala>
Seq("2010|10|10").toDF.repartition(1).write.mode("overwrite").text("/tmp/foo")
scala> spark.read.option("inferSchema", "true").option("header",
"false").option("timestampFormat", "yyyy|MM|dd").csv("/tmp/foo").printSchema()
18/12/17 12:11:47 ERROR TimestampFormatter: Iso8601TimestampFormatter is
being used
root
|-- _c0: timestamp (nullable = true)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]