peter-toth commented on code in PR #56198:
URL: https://github.com/apache/spark/pull/56198#discussion_r3328503541
##########
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java:
##########
@@ -751,7 +762,10 @@ public final int appendStruct(boolean isNull) {
putNull(elementsAppended);
elementsAppended++;
for (WritableColumnVector c: childColumns) {
- if (c.type instanceof StructType || c.type instanceof VariantType) {
+ if (c.type instanceof StructType || c.type instanceof VariantType
+ || c.type instanceof CalendarIntervalType
Review Comment:
Adding `CalendarIntervalType` here isn't just supporting the new nanos types
— it also fixes a previously-latent bug for nested struct-of-interval. Pre-PR,
when an outer struct column was appended as null and one of its child columns
was a `CalendarInterval`, the interval child took the `else` branch
(`c.appendNull()`), advancing only the interval's own cursor and leaving its
three grandchild primitive columns (`months`/`days`/`microseconds`)
un-advanced. Subsequent rows would then write into the wrong grandchild slots —
silent skew.
The fix is correct, but:
1. The PR description doesn't mention this. Worth one line so reviewers
don't miss the interval-semantics change.
2. There's no test exercising the new recursion. The minimum case is a
struct-of-interval column with at least one null parent row, then read back the
next non-null row's children to verify they aren't shifted. Same shape extends
to struct-of-`TimestampNanos`.
Up to you whether to split out into a separate commit or keep bundled.
##########
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnVectorSuite.scala:
##########
@@ -379,6 +379,50 @@ class ColumnVectorSuite extends SparkFunSuite with
SQLHelper {
}
}
+ testVectors("timestamp_ntz_nanos", 10, TimestampNTZNanosType(9)) {
testVector =>
+ val values = (0 until 10).map(i => TimestampNanosVal.fromParts(i * 1000L,
i.toShort))
+ values.foreach { v =>
+ testVector.putNotNull(testVector.elementsAppended)
+ testVector.putTimestampNTZNanos(testVector.elementsAppended, v)
+ testVector.elementsAppended += 1
+ }
+ values.zipWithIndex.foreach { case (v, i) =>
+ assert(testVector.getTimestampNTZNanos(i) === v)
+ }
+ testVector.putNull(0)
+ assert(testVector.isNullAt(0))
+ }
+
+ testVectors("timestamp_ltz_nanos", 10, TimestampLTZNanosType(9)) {
testVector =>
+ val values = (0 until 10).map(i => TimestampNanosVal.fromParts(i * 1000L,
i.toShort))
+ values.foreach { v =>
+ testVector.putNotNull(testVector.elementsAppended)
+ testVector.putTimestampLTZNanos(testVector.elementsAppended, v)
+ testVector.elementsAppended += 1
+ }
+ values.zipWithIndex.foreach { case (v, i) =>
+ assert(testVector.getTimestampLTZNanos(i) === v)
+ }
+ testVector.putNull(0)
+ assert(testVector.isNullAt(0))
+ }
+
+ testVectors("mutable ColumnarRow with TimestampNTZNanosType", 5,
+ TimestampNTZNanosType(9)) { testVector =>
+ val mutableRow = new MutableColumnarRow(Array(testVector))
+ val values = (0 until 5).map(i => TimestampNanosVal.fromParts(i * 100L,
i.toShort))
+ values.zipWithIndex.foreach { case (v, i) =>
+ mutableRow.rowId = i
+ mutableRow.setTimestampNTZNanos(0, v)
+ }
+ values.zipWithIndex.foreach { case (v, i) =>
+ mutableRow.rowId = i
+ assert(mutableRow.getTimestampNTZNanos(0) === v)
+ assert(mutableRow.get(0, TimestampNTZNanosType(9)) === v)
+ assert(mutableRow.copy().get(0, TimestampNTZNanosType(9)) === v)
+ }
+ }
Review Comment:
The PR adds a `MutableColumnarRow` test for `TimestampNTZNanosType` but not
for `TimestampLTZNanosType`. The LTZ paths (`setTimestampLTZNanos` at
`MutableColumnarRow.java:348`, the `update(TimestampLTZNanosType)` and
`get(TimestampLTZNanosType)` dispatches at `MutableColumnarRow.java:240,272`,
and the `copy()` branch at `MutableColumnarRow.java:104`) aren't exercised.
Adding a parallel `mutable ColumnarRow with TimestampLTZNanosType` block right
after this one closes the gap.
##########
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java:
##########
@@ -106,6 +107,10 @@ public static void populate(
} else if (pdt instanceof PhysicalCalendarIntervalType) {
// The value of `numRows` is irrelevant.
col.setCalendarInterval((CalendarInterval) row.get(fieldIdx, t));
+ } else if (pdt instanceof PhysicalTimestampNTZNanosType) {
Review Comment:
The two `else if` branches are identical. `appendValue` below at line 178
already collapses both nanos types into one condition with `||`. Suggest the
same here:
```java
} else if (pdt instanceof PhysicalTimestampNTZNanosType ||
pdt instanceof PhysicalTimestampLTZNanosType) {
col.setTimestampNanosVal((TimestampNanosVal) row.get(fieldIdx, t));
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]