AMRUTH-ASHOK opened a new pull request, #53687:
URL: https://github.com/apache/spark/pull/53687

   Resolves Issue: #SPARK-54908
   
   ### What changes were proposed in this pull request?
   In 
[JsonInferSchema.scala](https://github.com/apache/spark/blob/8fe006b20877671c75e4650a27d268b496294299/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala#L42)
   1. No dateFormatter field was created (unlike CSV - The issue doesn't 
surface with CSV files).
   2. dateFormatInRead existed in JSONOptions but was only used during parsing, 
not inference.
   3. The code path never checked date patterns.
   
   
[CSVInferSchema.scala](https://github.com/apache/spark/blob/8fe006b20877671c75e4650a27d268b496294299/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala#L49)
 has dateFormater and checks for DateType in 
[tryParseDate](https://github.com/apache/spark/blob/8fe006b20877671c75e4650a27d268b496294299/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala#L203)
 in 
[inferField](https://github.com/apache/spark/blob/8fe006b20877671c75e4650a27d268b496294299/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala#L143)
   
   Implementing the same logic in 
[JsonInferSchema.scala](https://github.com/apache/spark/blob/8fe006b20877671c75e4650a27d268b496294299/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala#L42)
   1. Added dateFormatter field
   2. Updated inferField() to check date patterns
   
   ### Why are the changes needed?
   Inconsistency: CSV respects dateFormat during schema inference; JSON did not.
   Type safety: Dates inferred as strings lose type information and require 
manual casting. A Customer reported this on a support ticket with Databricks.
   
   
   ### Does this PR introduce _any_ user-facing change?
   JSON respects dateFormat during schema inference
   
   
   ### How was this patch tested?
   1. Added a new test case "SPARK-54908: dateFormat option is applied during 
JSON schema inference" in JsonInferSchemaSuite.scala
   Tested 5 cases covering basic inference, independent inference, precedence, 
and mixed fields
   `mvn test -Dtest=JsonInferSchemaSuite -pl sql/catalyst`
   2. Ran full catalyst test suite (mvn test -pl sql/catalyst) to ensure no 
existing functionality was broken.
   `mvn test -pl sql/catalyst`
   3. Scalastyle linting: `./dev/scalastyle`
   
   ### Was this patch authored or co-authored using generative AI tooling?
   Cursor 2.3.15 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to