Github user sergey-rubtsov commented on a diff in the pull request:
https://github.com/apache/spark/pull/20140#discussion_r166261313
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
---
@@ -150,6 +151,16 @@ class CSVOptions(
val isCommentSet = this.comment != '\u0000'
+ def dateFormatter: DateTimeFormatter = {
+ DateTimeFormatter.ofPattern(dateFormat.getPattern)
+
.withLocale(Locale.US).withZone(timeZone.toZoneId).withResolverStyle(ResolverStyle.SMART)
+ }
+
+ def timestampFormatter: DateTimeFormatter = {
+ DateTimeFormatter.ofPattern(timestampFormat.getPattern)
--- End diff --
DateTimeFormatter is a standard time library from java 8. FastDateFormat
can't properly parse date and timestamp.
I can create some test cases to prove it, but I need many time for that.
Also, FastDateFormat does not meet the ISO8601:
https://en.wikipedia.org/wiki/ISO_8601
Current implementation of CSVInferSchema contains other bugs. For example,
test test("Timestamp field types are inferred correctly via custom date
format") in class CSVInferSchemaSuite must not pass, because timestampFormat
"yyyy-mm" is wrong format for year and month. It should be "yyyy-MM".
It is better to make refactor of date types and change deprecated types on
new ones for the whole project.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]