xiaonanyang-db commented on code in PR #37933:
URL: https://github.com/apache/spark/pull/37933#discussion_r975634588


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala:
##########
@@ -183,7 +180,9 @@ class CSVOptions(
       Some(parameters.getOrElse("timestampFormat",
         s"${DateFormatter.defaultPattern}'T'HH:mm:ss.SSSXXX"))
     } else {
-      parameters.get("timestampFormat")
+      // Use Iso8601TimestampFormatter (with strict timestamp parsing) to
+      // avoid parsing dates in timestamp columns as timestamp type

Review Comment:
   Totally agree with your concerns @cloud-fan @sadikovi.
   
   After some quick discussion within my team, we agreed on not changing these 
lines to avoid unnecessary regressions and any other behavior changes. Thus, 
the behavior after this PR become:
   - If user provides a `timestampFormat/timestampNTZFormat`, we will strictly 
parse fields as timestamp according to the format. Thus, columns with mixing 
dates and timestamps will always be inferred as `StringType`.
   - If no `timestampFormat/timestampNTZFormat` specified by user, for a column 
with mixing dates and timestamps
     - If date values are before timestamp values
       - If `prefersDate=true`, the column will be inferred as `StringType`
       - otherwise
         - If the date format is supported by `Iso8601TimestampFormatter `, the 
column will be inferred as `timestampFormat/timestampNTZFormat`
         - otherwise, the column will be inferred as `StringType`
     - If timestamp values are before date values
       - If the date format is supported by `Iso8601TimestampFormatter `, the 
column will be inferred as `timestampFormat/timestampNTZFormat`
       - otherwise the column will be inferred as `StringType`
   
   Does this make sense to you? @sadikovi @cloud-fan 
   
   cc @brkyvz @Yaohua628 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to