HyukjinKwon commented on a change in pull request #23150: [SPARK-26178][SQL] 
Use java.time API for parsing timestamps and dates from CSV
URL: https://github.com/apache/spark/pull/23150#discussion_r242021998
 
 

 ##########
 File path: docs/sql-migration-guide-upgrade.md
 ##########
 @@ -33,6 +33,8 @@ displayTitle: Spark SQL Upgrading Guide
 
   - Spark applications which are built with Spark version 2.4 and prior, and 
call methods of `UserDefinedFunction`, need to be re-compiled with Spark 3.0, 
as they are not binary compatible with Spark 3.0.
 
+  - Since Spark 3.0, CSV datasource uses java.time API for parsing and 
generating CSV content. New formatting implementation supports date/timestamp 
patterns conformed to ISO 8601. To switch back to the implementation used in 
Spark 2.4 and earlier, set `spark.sql.legacy.timeParser.enabled` to `true`.
 
 Review comment:
   @MaxGekk, can you check if this legacy configuration works or not?
   
   I checked it as below:
   
   ```diff
   diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
   index 2b8d22dde92..08795972fb7 100644
   --- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
   +++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
   @@ -26,6 +26,7 @@ import scala.util.Try
   
    import org.apache.commons.lang3.time.FastDateFormat
   
   +import org.apache.spark.internal.Logging
    import org.apache.spark.sql.internal.SQLConf
   
    sealed trait TimestampFormatter {
   @@ -112,11 +113,13 @@ class LegacyFallbackTimestampFormatter(
      }
    }
   
   -object TimestampFormatter {
   +object TimestampFormatter extends Logging {
      def apply(format: String, timeZone: TimeZone, locale: Locale): 
TimestampFormatter = {
        if (SQLConf.get.legacyTimeParserEnabled) {
   +      logError("LegacyFallbackTimestampFormatter is being used")
          new LegacyFallbackTimestampFormatter(format, timeZone, locale)
        } else {
   +      logError("Iso8601TimestampFormatter is being used")
          new Iso8601TimestampFormatter(format, timeZone, locale)
        }
      }
   ```
   
   
   ```bash
   $ ./bin/spark-shell --conf spark.sql.legacy.timeParser.enabled=true
   ```
   
   
   ```scala
   scala> spark.conf.get("spark.sql.legacy.timeParser.enabled")
   res0: String = true
   
   scala> 
Seq("2010|10|10").toDF.repartition(1).write.mode("overwrite").text("/tmp/foo")
   
   scala> spark.read.option("inferSchema", "true").option("header", 
"false").option("timestampFormat", "yyyy|MM|dd").csv("/tmp/foo").printSchema()
   18/12/17 12:11:47 ERROR TimestampFormatter: Iso8601TimestampFormatter is 
being used
   root
    |-- _c0: timestamp (nullable = true)
   ```
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to