Github user sergey-rubtsov commented on a diff in the pull request:
https://github.com/apache/spark/pull/21363#discussion_r189588135
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
---
@@ -140,14 +141,23 @@ private[csv] object CSVInferSchema {
private def tryParseDouble(field: String, options: CSVOptions): DataType
= {
if ((allCatch opt field.toDouble).isDefined || isInfOrNan(field,
options)) {
DoubleType
+ } else {
+ tryParseDate(field, options)
--- End diff --
For example, by mistake we have identical "timestampFormat" and
"dateFormat" options.
Let it be "yyyy-MM-dd"
'TimestampType' (8 bytes) is larger than 'DateType' (4 bytes)
So if they can overlap, we need to try parse it as date firstly, because
both of these types are suitable, but you need to try to use a more compact by
default and it will be correct inferring of type
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]