[PR] [SPARK-57103][SQL] Add hashing for nanosecond timestamp types [spark]

via GitHub Fri, 29 May 2026 06:11:27 -0700


stevomitric opened a new pull request, #56203:
URL: https://github.com/apache/spark/pull/56203


   
   ### What changes were proposed in this pull request?
   Add hashing support for the nanosecond timestamp types 
`TimestampNTZNanosType(p)` and `TimestampLTZNanosType(p)`, in both the 
interpreted and codegen paths of hash.scala:
   
   - Murmur3Hash / XxHash64: mix both fields, following the existing 
CalendarInterval pattern - `hashInt(nanosWithinMicro, 
   hashLong(epochMicros, seed))`.
   - HiveHash: a dedicated `hashTimestampNanos` that extends the existing 
`hashTimestamp` with the sub-microsecond nanoseconds using the same * 37 + 
field idiom as `hashCalendarInterval`.
   
   ### Why are the changes needed?
   hash-based GROUP BY / DISTINCT / joins - failed on nanosecond timestamp 
columns.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   New unit tests.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   Generated-by: Claude Opus 4.7
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-57103][SQL] Add hashing for nanosecond timestamp types [spark]

Reply via email to