Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/23201#discussion_r240000411
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
---
@@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options:
JSONOptions) extends Serializable {
DecimalType(bigDecimal.precision, bigDecimal.scale)
}
decimalTry.getOrElse(StringType)
- case VALUE_STRING => StringType
+ case VALUE_STRING =>
+ val stringValue = parser.getText
--- End diff --
The order can be matter if you have the same pattern (or similar) for dates
and timestamps. `DateType` can be preferable because it requires less memory.
It seems reasonable to move from `DateType` to `TimestampType` during
schema inferring since opposite one is impossible without loosing info.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]