[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36871: [SPARK-39469][SQL] Infer date type for CSV schema inference

GitBox Wed, 15 Jun 2022 21:58:51 -0700


HyukjinKwon commented on code in PR #36871:
URL: https://github.com/apache/spark/pull/36871#discussion_r898692164



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala:
##########
@@ -169,6 +174,14 @@ class CSVInferSchema(val options: CSVOptions) extends 
Serializable {
   private def tryParseDouble(field: String): DataType = {
     if ((allCatch opt field.toDouble).isDefined || isInfOrNan(field)) {
       DoubleType
+    } else {
+      tryParseDateTime(field)
+    }
+  }
+
+  private def tryParseDateTime(field: String): DataType = {
+    if ((allCatch opt dateFormatter.parse(field)).isDefined) {

Review Comment:
   We should probably 1. add  either SQL configuration or an option e.g., 
`infersDate` (like `prefersDecimal` in JSON options), or 2. only infers date 
types when `dateFormat` is specified.  The main idea is to avoid perf 
regression in schema inference by adding this.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36871: [SPARK-39469][SQL] Infer date type for CSV schema inference

Reply via email to