Github user sergey-rubtsov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21363#discussion_r189588135
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
    @@ -140,14 +141,23 @@ private[csv] object CSVInferSchema {
       private def tryParseDouble(field: String, options: CSVOptions): DataType 
= {
         if ((allCatch opt field.toDouble).isDefined || isInfOrNan(field, 
options)) {
           DoubleType
    +    } else {
    +      tryParseDate(field, options)
    --- End diff --
    
    For example, by mistake we have identical "timestampFormat" and 
"dateFormat" options.
    Let it be "yyyy-MM-dd"
    'TimestampType' (8 bytes) is larger than 'DateType' (4 bytes)
    So if they can overlap, we need to try parse it as date firstly, because 
both of these types are suitable, but you need to try to use a more compact by 
default and it will be correct inferring of type


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to