Re: [PR] [SPARK-57100][SQL] Add columnar (ColumnVector) support for nanosecond timestamp types [spark]

via GitHub Sat, 30 May 2026 02:18:35 -0700


peter-toth commented on code in PR #56198:
URL: https://github.com/apache/spark/pull/56198#discussion_r3328503541



##########
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java:
##########
@@ -751,7 +762,10 @@ public final int appendStruct(boolean isNull) {
       putNull(elementsAppended);
       elementsAppended++;
       for (WritableColumnVector c: childColumns) {
-        if (c.type instanceof StructType || c.type instanceof VariantType) {
+        if (c.type instanceof StructType || c.type instanceof VariantType
+            || c.type instanceof CalendarIntervalType

Review Comment:
   Adding `CalendarIntervalType` here isn't just supporting the new nanos types 
— it also fixes a previously-latent bug for nested struct-of-interval. Pre-PR, 
when an outer struct column was appended as null and one of its child columns 
was a `CalendarInterval`, the interval child took the `else` branch 
(`c.appendNull()`), advancing only the interval's own cursor and leaving its 
three grandchild primitive columns (`months`/`days`/`microseconds`) 
un-advanced. Subsequent rows would then write into the wrong grandchild slots — 
silent skew.
   
   The fix is correct, but:
   
   1. The PR description doesn't mention this. Worth one line so reviewers 
don't miss the interval-semantics change.
   2. There's no test exercising the new recursion. The minimum case is a 
struct-of-interval column with at least one null parent row, then read back the 
next non-null row's children to verify they aren't shifted. Same shape extends 
to struct-of-`TimestampNanos`.
   
   Up to you whether to split out into a separate commit or keep bundled.



##########
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnVectorSuite.scala:
##########
@@ -379,6 +379,50 @@ class ColumnVectorSuite extends SparkFunSuite with 
SQLHelper {
     }
   }
 
+  testVectors("timestamp_ntz_nanos", 10, TimestampNTZNanosType(9)) { 
testVector =>
+    val values = (0 until 10).map(i => TimestampNanosVal.fromParts(i * 1000L, 
i.toShort))
+    values.foreach { v =>
+      testVector.putNotNull(testVector.elementsAppended)
+      testVector.putTimestampNTZNanos(testVector.elementsAppended, v)
+      testVector.elementsAppended += 1
+    }
+    values.zipWithIndex.foreach { case (v, i) =>
+      assert(testVector.getTimestampNTZNanos(i) === v)
+    }
+    testVector.putNull(0)
+    assert(testVector.isNullAt(0))
+  }
+
+  testVectors("timestamp_ltz_nanos", 10, TimestampLTZNanosType(9)) { 
testVector =>
+    val values = (0 until 10).map(i => TimestampNanosVal.fromParts(i * 1000L, 
i.toShort))
+    values.foreach { v =>
+      testVector.putNotNull(testVector.elementsAppended)
+      testVector.putTimestampLTZNanos(testVector.elementsAppended, v)
+      testVector.elementsAppended += 1
+    }
+    values.zipWithIndex.foreach { case (v, i) =>
+      assert(testVector.getTimestampLTZNanos(i) === v)
+    }
+    testVector.putNull(0)
+    assert(testVector.isNullAt(0))
+  }
+
+  testVectors("mutable ColumnarRow with TimestampNTZNanosType", 5,
+      TimestampNTZNanosType(9)) { testVector =>
+    val mutableRow = new MutableColumnarRow(Array(testVector))
+    val values = (0 until 5).map(i => TimestampNanosVal.fromParts(i * 100L, 
i.toShort))
+    values.zipWithIndex.foreach { case (v, i) =>
+      mutableRow.rowId = i
+      mutableRow.setTimestampNTZNanos(0, v)
+    }
+    values.zipWithIndex.foreach { case (v, i) =>
+      mutableRow.rowId = i
+      assert(mutableRow.getTimestampNTZNanos(0) === v)
+      assert(mutableRow.get(0, TimestampNTZNanosType(9)) === v)
+      assert(mutableRow.copy().get(0, TimestampNTZNanosType(9)) === v)
+    }
+  }

Review Comment:
   The PR adds a `MutableColumnarRow` test for `TimestampNTZNanosType` but not 
for `TimestampLTZNanosType`. The LTZ paths (`setTimestampLTZNanos` at 
`MutableColumnarRow.java:348`, the `update(TimestampLTZNanosType)` and 
`get(TimestampLTZNanosType)` dispatches at `MutableColumnarRow.java:240,272`, 
and the `copy()` branch at `MutableColumnarRow.java:104`) aren't exercised. 
Adding a parallel `mutable ColumnarRow with TimestampLTZNanosType` block right 
after this one closes the gap.



##########
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java:
##########
@@ -106,6 +107,10 @@ public static void populate(
       } else if (pdt instanceof PhysicalCalendarIntervalType) {
         // The value of `numRows` is irrelevant.
         col.setCalendarInterval((CalendarInterval) row.get(fieldIdx, t));
+      } else if (pdt instanceof PhysicalTimestampNTZNanosType) {

Review Comment:
   The two `else if` branches are identical. `appendValue` below at line 178 
already collapses both nanos types into one condition with `||`. Suggest the 
same here:
   
   ```java
   } else if (pdt instanceof PhysicalTimestampNTZNanosType ||
       pdt instanceof PhysicalTimestampLTZNanosType) {
     col.setTimestampNanosVal((TimestampNanosVal) row.get(fieldIdx, t));
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-57100][SQL] Add columnar (ColumnVector) support for nanosecond timestamp types [spark]

Reply via email to