gengliangwang commented on code in PR #44800:
URL: https://github.com/apache/spark/pull/44800#discussion_r1459496728
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala:
##########
@@ -159,16 +160,21 @@ class JsonInferSchema(options: JSONOptions) extends
Serializable with Logging {
val bigDecimal = decimalParser(field)
DecimalType(bigDecimal.precision, bigDecimal.scale)
}
- val timestampType = SQLConf.get.timestampType
if (options.prefersDecimal && decimalTry.isDefined) {
decimalTry.get
- } else if (options.inferTimestamp &&
(SQLConf.get.legacyTimeParserPolicy ==
- LegacyBehaviorPolicy.LEGACY || timestampType == TimestampNTZType) &&
+ } else if (options.inferTimestamp) {
+ // For text-based format, it's ambiguous to infer a timestamp string
without timezone, as
+ // it can be both TIMESTAMP LTZ and NTZ. To avoid behavior changes
with the new support
+ // of NTZ, here we only try to infer NTZ if the config is set to use
NTZ by default, or
+ // the NTZ timestamp format is set in the parsing options.
+ if ((isDefaultNTZ || options.timestampNTZFormatInRead.isDefined) &&
Review Comment:
This seems to be a behavior change. Before this PR,
`options.timestampNTZFormatInRead` doesn't affect the inference result. Shall
we simply make the condition as
```
if (isDefaultNTZ &&
timestampNTZFormatter.parseWithoutTimeZoneOptional(field, false).isDefined)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]