Re: [PR] [SPARK-57735][SQL] Support nanosecond-precision timestamp types in the in-memory columnar cache [spark]

via GitHub Sun, 28 Jun 2026 00:01:36 -0700


MaxGekk commented on code in PR #56842:
URL: https://github.com/apache/spark/pull/56842#discussion_r3487492208



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala:
##########
@@ -326,6 +326,21 @@ private[columnar] final class IntervalColumnStats extends 
ColumnStats {
     Array[Any](null, null, nullCount, count, sizeInBytes)
 }
 
+private[columnar] final class TimestampNanosColumnStats extends ColumnStats {

Review Comment:
   `TimestampNanosColumnStats` emits `null`/`null` for lower/upper (the 
`CalendarInterval` / `IntervalColumnStats` pattern), so cached 
nanosecond-timestamp columns get no batch-level partition pruning.
   
   The same logical type at micro precision takes a different path: 
`TimestampType`/`TimestampNTZType` -> `LongColumnBuilder` -> `LongColumnStats`, 
which collects min/max. So a range filter (`WHERE ts > '...'`) over a cached 
`TIMESTAMP_NTZ(6)` column skips non-matching batches, while the same filter 
over a cached `TIMESTAMP_NTZ(9)` column scans every batch.
   
   `TimestampNanosVal` is `Comparable` (its total order is calendar order), and 
ordered non-primitive cache types already keep bounds — `DecimalColumnStats` 
collects `Decimal` min/max. So tracking `upper`/`lower` as `TimestampNanosVal` 
here (modeled on `DecimalColumnStats` rather than `IntervalColumnStats`) would 
preserve the pruning the micro path provides.
   
   Not a correctness issue — the feature works. Is the bounds-less choice 
intentional (follow `CalendarInterval`), or worth collecting min/max so cached 
nanos timestamps prune like micro timestamps?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-57735][SQL] Support nanosecond-precision timestamp types in the in-memory columnar cache [spark]

Reply via email to