AMRUTH-ASHOK opened a new pull request, #53687: URL: https://github.com/apache/spark/pull/53687
Resolves Issue: #SPARK-54908 ### What changes were proposed in this pull request? In [JsonInferSchema.scala](https://github.com/apache/spark/blob/8fe006b20877671c75e4650a27d268b496294299/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala#L42) 1. No dateFormatter field was created (unlike CSV - The issue doesn't surface with CSV files). 2. dateFormatInRead existed in JSONOptions but was only used during parsing, not inference. 3. The code path never checked date patterns. [CSVInferSchema.scala](https://github.com/apache/spark/blob/8fe006b20877671c75e4650a27d268b496294299/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala#L49) has dateFormater and checks for DateType in [tryParseDate](https://github.com/apache/spark/blob/8fe006b20877671c75e4650a27d268b496294299/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala#L203) in [inferField](https://github.com/apache/spark/blob/8fe006b20877671c75e4650a27d268b496294299/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala#L143) Implementing the same logic in [JsonInferSchema.scala](https://github.com/apache/spark/blob/8fe006b20877671c75e4650a27d268b496294299/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala#L42) 1. Added dateFormatter field 2. Updated inferField() to check date patterns ### Why are the changes needed? Inconsistency: CSV respects dateFormat during schema inference; JSON did not. Type safety: Dates inferred as strings lose type information and require manual casting. A Customer reported this on a support ticket with Databricks. ### Does this PR introduce _any_ user-facing change? JSON respects dateFormat during schema inference ### How was this patch tested? 1. Added a new test case "SPARK-54908: dateFormat option is applied during JSON schema inference" in JsonInferSchemaSuite.scala Tested 5 cases covering basic inference, independent inference, precedence, and mixed fields `mvn test -Dtest=JsonInferSchemaSuite -pl sql/catalyst` 2. Ran full catalyst test suite (mvn test -pl sql/catalyst) to ensure no existing functionality was broken. `mvn test -pl sql/catalyst` 3. Scalastyle linting: `./dev/scalastyle` ### Was this patch authored or co-authored using generative AI tooling? Cursor 2.3.15 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
