Github user sergey-rubtsov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20140#discussion_r166261313
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
    @@ -150,6 +151,16 @@ class CSVOptions(
     
       val isCommentSet = this.comment != '\u0000'
     
    +  def dateFormatter: DateTimeFormatter = {
    +    DateTimeFormatter.ofPattern(dateFormat.getPattern)
    +      
.withLocale(Locale.US).withZone(timeZone.toZoneId).withResolverStyle(ResolverStyle.SMART)
    +  }
    +
    +  def timestampFormatter: DateTimeFormatter = {
    +    DateTimeFormatter.ofPattern(timestampFormat.getPattern)
    --- End diff --
    
    DateTimeFormatter is a standard time library from java 8. FastDateFormat 
can't properly parse date and timestamp. 
    
    I can create some test cases to prove it, but I need many time for that.
    
    Also, FastDateFormat does not meet the ISO8601: 
https://en.wikipedia.org/wiki/ISO_8601
    Current implementation of CSVInferSchema contains other bugs. For example, 
test test("Timestamp field types are inferred correctly via custom date 
format") in class CSVInferSchemaSuite must not pass, because timestampFormat 
"yyyy-mm" is wrong format for year and month. It should be "yyyy-MM".
    It is better to make refactor of date types and change deprecated types on 
new ones for the whole project.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to