jubins opened a new pull request, #56582: URL: https://github.com/apache/spark/pull/56582
## What is the purpose of the change Fixes [SPARK-57517](https://issues.apache.org/jira/browse/SPARK-57517) — `schema_of_json` throws a `ClassCastException` during analysis when called with a non-string literal (e.g., `SELECT schema_of_json(42)`), instead of surfacing a clean `DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE` error. The root cause is in `SchemaOfJson.checkInputDataTypes()`: it references a `lazy val json = child.eval().asInstanceOf[UTF8String]` before verifying that the child's type is `StringType`. For an integer literal, the `asInstanceOf[UTF8String]` cast throws `ClassCastException` at analysis time rather than producing a user-facing error. The companion functions `schema_of_csv` and `schema_of_xml` were fixed for the same issue in SPARK-52234, but `schema_of_json` was missed. This PR applies the same fix: restructuring `checkInputDataTypes` to check `!foldable` → `eval() == null` → `dataType != StringType` in safe order, and removing the unsafe lazy val entirely. ## Brief change log - `SchemaOfJson.checkInputDataTypes()`: removed the `lazy val json` that performed an unsafe `asInstanceOf[UTF8String]` cast; restructured the condition chain to check for non-foldable input, null input, and wrong type (adding a new `UNEXPECTED_INPUT_TYPE` branch) before delegating to `super.checkInputDataTypes()` - Added `select schema_of_json(42)` to `json-functions.sql` input - Added corresponding `DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE` expected entries to `analyzer-results/json-functions.sql.out` and `results/json-functions.sql.out` ## Verifying this change This change is covered by golden file SQL query tests in `SQLQueryTestSuite`: - `select schema_of_json(42)` — verifies that a non-string integer literal produces `DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE` at analysis time (previously threw `ClassCastException`) - Existing tests for `schema_of_json(null)` and `schema_of_json(nonFoldableColumn)` continue to pass, confirming the null and non-foldable branches are unaffected ## Does this pull request potentially affect one of the following parts - **Dependencies** (does it add or upgrade a dependency): no - **The public API, i.e., is any changed class annotated with `@Public(Evolving)`**: no — `SchemaOfJson` is an internal catalyst expression - **The serializers**: no - **The runtime per-record code paths (performance sensitive)**: no — only affects the analysis-time type check path - **Anything that affects deployment or recovery**: no - **The S3 file system connector**: no ## Documentation Does this pull request introduce a new feature? no If yes, how is the feature documented? not applicable ## Was generative AI tooling used to co-author this PR? - [x] Yes — Claude Code was used as a pair-programming assistant. All code was written, understood, and verified by the author. Generated-by: Claude Opus 4.8 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
