james-willis opened a new pull request, #55990: URL: https://github.com/apache/spark/pull/55990
Backport of #54701 to branch-4.0. ### What changes were proposed in this pull request? `ColumnarRow.get()`, `ColumnarBatchRow.get()`, and `ColumnarArray.get()` throw `SparkUnsupportedOperationException` when called with a `UserDefinedType` because they have no branch to handle UDTs. This PR adds UDT handling to all three methods: - **ColumnarRow** and **ColumnarBatchRow**: Add an `instanceof UserDefinedType` branch that recurses with `udt.sqlType()`, matching the pattern already used in `SpecializedGettersReader.read()`. - **ColumnarArray**: Change the `handleUserDefinedType` flag from `false` to `true` in the existing call to `SpecializedGettersReader.read()`. ### Why are the changes needed? The codegen path (`CodeGenerator.getValue()`) unwraps `udt.sqlType()` before generating accessor calls, so UDT columns work when whole-stage codegen is active. However, on the interpreted eval path — when codegen is disabled, falls back, or the number of fields exceeds `spark.sql.codegen.maxFields` — `GetStructField.nullSafeEval` calls `ColumnarRow.get(ordinal, udtType)` directly, which hits the unhandled branch and throws. ### Does this PR introduce _any_ user-facing change? Yes. UDT columns in columnar data sources (e.g., Parquet) now work correctly on the interpreted evaluation path. Previously they would throw `SparkUnsupportedOperationException`. ### How was this patch tested? Added 6 new tests in `ColumnarBatchSuite` covering all 3 methods x 2 UDT backing types (primitive `IntegerType` and complex `StructType`). Each test creates columnar vectors with UDT data and verifies that `get()` returns the correct value. Two helper UDT classes (`TestIntUDT`, `TestStructWrapperUDT`) are defined for the tests. Cherry-picked from 472735cefef on master. The cherry-pick had a trivial conflict in `ColumnarBatchSuite.scala`: the neighboring `[SPARK-55552] Variant` test exists on branch-4.1+ but not on branch-4.0, so its insertion point was contested. Resolved by keeping only the SPARK-55897 tests (the Variant test is unrelated). ### Was this patch authored or co-authored using generative AI tooling? Yes. Opus 4.6 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
