xiaonanyang-db commented on code in PR #37933: URL: https://github.com/apache/spark/pull/37933#discussion_r975634588
########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala: ########## @@ -183,7 +180,9 @@ class CSVOptions( Some(parameters.getOrElse("timestampFormat", s"${DateFormatter.defaultPattern}'T'HH:mm:ss.SSSXXX")) } else { - parameters.get("timestampFormat") + // Use Iso8601TimestampFormatter (with strict timestamp parsing) to + // avoid parsing dates in timestamp columns as timestamp type Review Comment: Totally agree with your concerns @cloud-fan @sadikovi. After some quick discussion within my team, we agreed on not changing these lines to avoid unnecessary regressions and any other behavior changes. Thus, the behavior after this PR become: - If user provides a `timestampFormat/timestampNTZFormat`, we will strictly parse fields as timestamp according to the format. Thus, columns with mixing dates and timestamps will always be inferred as `StringType`. - If no `timestampFormat/timestampNTZFormat` specified by user, for a column with mixing dates and timestamps - If date values are before timestamp values - If `prefersDate=true`, the column will be inferred as `StringType` - otherwise - If the date format is supported by `Iso8601TimestampFormatter `, the column will be inferred as `timestampFormat/timestampNTZFormat` - otherwise, the column will be inferred as `StringType` - If timestamp values are before date values - If the date format is supported by `Iso8601TimestampFormatter `, the column will be inferred as `timestampFormat/timestampNTZFormat` - otherwise the column will be inferred as `StringType` Does this make sense to you? @sadikovi @cloud-fan cc @brkyvz @Yaohua628 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org