[GitHub] [spark] SaurabhChawla100 commented on a change in pull request #32558: [SPARK-34953][CORE][SQL] Add the code change for adding the DateType in the infer schema while reading in CSV and JSON

GitBox Wed, 02 Jun 2021 01:21:51 -0700


SaurabhChawla100 commented on a change in pull request #32558:
URL: https://github.com/apache/spark/pull/32558#discussion_r643760281




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala
##########
@@ -160,6 +168,16 @@ class CSVInferSchema(val options: CSVOptions) extends 
Serializable {
   private def tryParseDouble(field: String): DataType = {
     if ((allCatch opt field.toDouble).isDefined || isInfOrNan(field)) {
       DoubleType
+    } else {
+      tryParseDateFormat(field)
+    }
+  }
+
+  private def tryParseDateFormat(field: String): DataType = {
+    if (options.inferDateType
+      && !dateFormatter.isInstanceOf[LegacySimpleDateFormatter]

Review comment:
       It has to be LegacyFastDateFormatter, missed to changed it. Previously I 
was using the SimpleDateFormatter so added this LegacySimpleDateFormatter, Now 
since we are using the FastDateFormatter it has to be LegacyFastDateFormatter, 
Making that change.
   
   If legacy is on, we have ambiguity about Datetype pattern matching, because 
they can be arbitrarily set by users.
   It does not do the exact match, which means it's not going to distinguish 
yyyy-MM and yyyy-MM-dd for input, for instance, 2010-10-10.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] SaurabhChawla100 commented on a change in pull request #32558: [SPARK-34953][CORE][SQL] Add the code change for adding the DateType in the infer schema while reading in CSV and JSON

Reply via email to