Re: [PR] [SPARK-46769][SQL] Refine timestamp related schema inference [spark]

via GitHub Fri, 19 Jan 2024 10:24:37 -0800


gengliangwang commented on code in PR #44800:
URL: https://github.com/apache/spark/pull/44800#discussion_r1459496728



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala:
##########
@@ -159,16 +160,21 @@ class JsonInferSchema(options: JSONOptions) extends 
Serializable with Logging {
           val bigDecimal = decimalParser(field)
             DecimalType(bigDecimal.precision, bigDecimal.scale)
         }
-        val timestampType = SQLConf.get.timestampType
         if (options.prefersDecimal && decimalTry.isDefined) {
           decimalTry.get
-        } else if (options.inferTimestamp && 
(SQLConf.get.legacyTimeParserPolicy ==
-          LegacyBehaviorPolicy.LEGACY || timestampType == TimestampNTZType) &&
+        } else if (options.inferTimestamp) {
+          // For text-based format, it's ambiguous to infer a timestamp string 
without timezone, as
+          // it can be both TIMESTAMP LTZ and NTZ. To avoid behavior changes 
with the new support
+          // of NTZ, here we only try to infer NTZ if the config is set to use 
NTZ by default, or
+          // the NTZ timestamp format is set in the parsing options.
+          if ((isDefaultNTZ || options.timestampNTZFormatInRead.isDefined) &&

Review Comment:
   This seems to be a behavior change. Before this PR, 
`options.timestampNTZFormatInRead` doesn't affect the inference result. Shall 
we simply make the condition as
   ```
   if (isDefaultNTZ && 
timestampNTZFormatter.parseWithoutTimeZoneOptional(field, false).isDefined)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46769][SQL] Refine timestamp related schema inference [spark]

Reply via email to