ZitingShen opened a new pull request, #56536:
URL: https://github.com/apache/spark/pull/56536
### What changes were proposed in this pull request?
- Make `ColumnarToRowExec` expose nullable row output when materializing a
columnar batch, because the batch null bitmap is the execution-time source of
truth.
- Rebind parent input attributes after inserting row/columnar transitions so
downstream row operators keep the execution-time nullability instead of stale
planned attributes.
- Add a regression covering whole-stage codegen, split consume, row
boundaries, partial hash aggregate, generate, and take-ordered/project paths
when a non-nullable columnar output physically contains a null.
### Why are the changes needed?
`ColumnarToRowExec` can materialize a real null from a columnar batch even
when the planned output attribute is non-nullable. Downstream row codegen can
then trust the stale non-nullable attribute, skip the null check, and fail
while materializing or consuming the row.
One observed failing plan had this shape:
```text
HashAggregate
HashAggregate
...
ColumnarToRow
Scan parquet ...
```
In that case the Parquet reader produced a physical null for a column whose
planned output attribute was non-nullable, and downstream generated row code
failed with a `UTF8String.getBaseObject()` `NullPointerException`.
### Does this PR introduce _any_ user-facing change?
Yes. Queries that read a physical null through `ColumnarToRowExec` despite
stale non-nullable planned metadata now preserve the null through row
materialization instead of crashing in downstream row codegen.
### How was this patch tested?
- Added `SparkPlanSuite` regression coverage for null materialization
through `ColumnarToRowExec` and representative downstream row consumers.
- Ran `git diff --check`.
- Attempted `build/sbt 'sql/testOnly
org.apache.spark.sql.execution.SparkPlanSuite -- -z "ColumnarToRowExec should
materialize null values from non-nullable columnar output"'`, but the local
checkout only has Java `25.0.2` and current Spark requires Java `25.0.3` or
later before test execution.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Codex
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]