viirya opened a new pull request, #56842: URL: https://github.com/apache/spark/pull/56842
### What changes were proposed in this pull request? The default in-memory columnar cache serializer (`DefaultCachedBatchSerializer`) did not support `TimestampNTZNanosType` / `TimestampLTZNanosType`. Caching a DataFrame with such a column failed at materialization with `not support type: TimestampNTZNanosType(9)`, because none of the cache's type-dispatch sites had a case for them. This adds full support, following the fixed-width multi-field pattern already used by `CalendarInterval`. The physical value `TimestampNanosVal` is a fixed 16-byte payload (an 8-byte `epochMicros` plus an 8-byte word holding `nanosWithinMicro`), so it maps cleanly onto that pattern: - **`ColumnType`**: a `TIMESTAMP_NANOS` column type (with `TIMESTAMP_NTZ_NANOS` / `TIMESTAMP_LTZ_NANOS` singletons) whose `append`/`extract` read and write the 16-byte payload, with a `MutableUnsafeRow` direct-copy fast path. - **`ColumnBuilder`, `ColumnAccessor`**: builder and accessor classes plus dispatch cases. - **`ColumnStats`**: a `TimestampNanosColumnStats` collector (fixed size, no min/max bounds). - **`GenerateColumnAccessor`**: the codegen accessor-class selection and initialization branch. `TIMESTAMP_NTZ` and `TIMESTAMP_LTZ` nanos types share the same storage and differ only by physical type and row getter/setter, so the encode/decode logic is shared between them. ### Why are the changes needed? Nanosecond-precision timestamp types are otherwise unsupported by the cache, so `df.cache()` on a column of these types throws. With this change such DataFrames cache and read back correctly, consistent with the microsecond `TIMESTAMP_NTZ` / `TIMESTAMP` types which the cache already supports. ### Does this PR introduce _any_ user-facing change? Yes. Previously, caching a DataFrame containing a `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)` column with `p` in the nanosecond range threw `not support type`. Now it caches and reads back the values, including sub-microsecond precision. ### How was this patch tested? - `ColumnTypeSuite`: append/extract round-trip for `TIMESTAMP_NTZ_NANOS` and `TIMESTAMP_LTZ_NANOS` (random values), plus `defaultSize` checks. - `InMemoryColumnarQuerySuite`: an end-to-end cache roundtrip for both nanos types, with the vectorized reader both on and off, covering sub-microsecond precision and null values. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
