MaxGekk opened a new pull request, #56235:
URL: https://github.com/apache/spark/pull/56235
### What changes were proposed in this pull request?
Include `CalendarIntervalType` in the recursion guard of
`WritableColumnVector.appendStruct(boolean isNull)`, so that appending a NULL
parent struct cascades `appendStruct(true)` into a `CalendarInterval` child
column and advances all three of its grandchild columns (months / days /
microseconds).
```java
for (WritableColumnVector c: childColumns) {
if (c.type instanceof StructType || c.type instanceof VariantType
|| c.type instanceof CalendarIntervalType) {
c.appendStruct(true);
} else {
c.appendNull();
}
}
```
This was split out of the nanosecond-timestamp `ColumnVector` PR
(SPARK-57100, #56198) per review, since it is an independent fix.
### Why are the changes needed?
A `CalendarInterval` column is struct-shaped: it is backed by three
grandchild primitive columns (`months` as int, `days` as int, `microseconds` as
long). The recursion guard in `appendStruct` only handled `StructType` and
`VariantType`, so an interval child column took the `else` branch
(`c.appendNull()`), which advances only the interval column's own cursor and
leaves its three grandchild cursors un-advanced.
As a result, for a struct column with a `CalendarInterval` field, appending
a NULL parent row left the interval's grandchild cursors behind by one. A
subsequent non-null row then wrote its `months`/`days`/`microseconds` into the
wrong (earlier) grandchild slots, and reading that row back returned a skewed
value - silent data corruption for the nested struct-of-interval case.
### Does this PR introduce _any_ user-facing change?
Yes. Reading back a struct-of-interval column that contains a NULL parent
row followed by a non-null row now returns the correct interval value instead
of a skewed one. Previously it returned corrupted data.
### How was this patch tested?
Added a unit test to `ColumnarBatchSuite` that uses `RowToColumnConverter`
to convert a null parent struct followed by non-null struct-of-interval rows,
and verifies the interval values are read back correctly. The test fails
without the fix and passes with it.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.8
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]