jubins opened a new pull request, #56582:
URL: https://github.com/apache/spark/pull/56582

   ## What is the purpose of the change
   
   Fixes [SPARK-57517](https://issues.apache.org/jira/browse/SPARK-57517) — 
`schema_of_json` throws a `ClassCastException` during analysis when called with 
a non-string literal (e.g., `SELECT schema_of_json(42)`), instead of surfacing 
a clean `DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE` error.
   
   The root cause is in `SchemaOfJson.checkInputDataTypes()`: it references a 
`lazy val json = child.eval().asInstanceOf[UTF8String]` before verifying that 
the child's type is `StringType`. For an integer literal, the 
`asInstanceOf[UTF8String]` cast throws `ClassCastException` at analysis time 
rather than producing a user-facing error.
   
   The companion functions `schema_of_csv` and `schema_of_xml` were fixed for 
the same issue in SPARK-52234, but `schema_of_json` was missed. This PR applies 
the same fix: restructuring `checkInputDataTypes` to check `!foldable` → 
`eval() == null` → `dataType != StringType` in safe order, and removing the 
unsafe lazy val entirely.
   
   ## Brief change log
   
   - `SchemaOfJson.checkInputDataTypes()`: removed the `lazy val json` that 
performed an unsafe `asInstanceOf[UTF8String]` cast; restructured the condition 
chain to check for non-foldable input, null input, and wrong type (adding a new 
`UNEXPECTED_INPUT_TYPE` branch) before delegating to 
`super.checkInputDataTypes()`
   - Added `select schema_of_json(42)` to `json-functions.sql` input
   - Added corresponding `DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE` expected 
entries to `analyzer-results/json-functions.sql.out` and 
`results/json-functions.sql.out`
   
   ## Verifying this change
   
   This change is covered by golden file SQL query tests in `SQLQueryTestSuite`:
   
   - `select schema_of_json(42)` — verifies that a non-string integer literal 
produces `DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE` at analysis time (previously 
threw `ClassCastException`)
   - Existing tests for `schema_of_json(null)` and 
`schema_of_json(nonFoldableColumn)` continue to pass, confirming the null and 
non-foldable branches are unaffected
   
   ## Does this pull request potentially affect one of the following parts
   
   - **Dependencies** (does it add or upgrade a dependency): no
   - **The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`**: no — `SchemaOfJson` is an internal catalyst expression
   - **The serializers**: no
   - **The runtime per-record code paths (performance sensitive)**: no — only 
affects the analysis-time type check path
   - **Anything that affects deployment or recovery**: no
   - **The S3 file system connector**: no
   
   ## Documentation
   
   Does this pull request introduce a new feature? no
   
   If yes, how is the feature documented? not applicable
   
   ## Was generative AI tooling used to co-author this PR?
   
   - [x] Yes — Claude Code was used as a pair-programming assistant. All code 
was written, understood, and verified by the author.
   
   Generated-by: Claude Opus 4.8


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to