MaxGekk opened a new pull request, #56266:
URL: https://github.com/apache/spark/pull/56266

   ### What changes were proposed in this pull request?
   
   This PR registers `TimestampNTZNanosType(p)` and `TimestampLTZNanosType(p)` 
(p in [7, 9]) in the Spark SQL Types Framework (SPARK-53504), gated by 
`spark.sql.types.framework.enabled`.
   
   It is split out of PR #56199 (SPARK-57101) per [review 
feedback](https://github.com/apache/spark/pull/56199#discussion_r3333031096), 
so that the timestamp-nano type registration is reviewed independently of the 
abstract Types Framework method additions. This PR deliberately only 
**overrides existing** `TypeOps` / `TypeApiOps` methods; it introduces **no new 
framework methods**.
   
   Concretely:
   - Add `TimestampNanosTypeOps` (catalyst) with `TimestampNTZNanosTypeOps` / 
`TimestampLTZNanosTypeOps`, registered in `TypeOps.apply()` next to `TimeType`. 
Overrides: `getPhysicalType`, `getJavaClass`, `getRowWriter`, 
`getDefaultLiteral`, `getJavaLiteral`, `getMutableValue`, `toCatalystImpl`, 
`toScala`, `toScalaImpl`.
   - Add `TimestampNanosTypeApiOps` (sql/api) with NTZ/LTZ subclasses, 
registered in `TypeApiOps.apply()`. `format` / `toSQLValue` are interim (based 
on `TimestampNanosVal.toString` with a `TIMESTAMP_NTZ` / `TIMESTAMP_LTZ` 
prefix); `getEncoder` reports the type as unsupported, matching the legacy 
`RowEncoder` fallback.
   - Add `MutableTimestampNanos` to `SpecificInternalRow` to avoid the 
`MutableAny` fallback.
   
   The existing call sites (`PhysicalDataType.apply`, `Literal.default`, 
`InternalRow.getWriter`/`getAccessor`, codegen Java class selection, 
`SpecificInternalRow` mutable columns) already delegate to 
`TypeOps(dt).map(...).getOrElse(legacy)`, so no per-call-site edits are needed 
beyond registration.
   
   Out of scope (follow-ups): encoders and `java.time` roundtrip (SPARK-57033), 
Connect proto, Arrow, PySpark conversion, cast/Parquet/ColumnVector, and 
physical ordering/compare/hash.
   
   ### Why are the changes needed?
   
   The logical nanosecond timestamp types (SPARK-56876) and the physical row 
layer (SPARK-56981) already exist, but these types are currently wired only 
through scattered legacy dispatch. Registering them in the Types Framework 
centralizes the type-specific operations behind `TypeOps`, consistent with 
`TimeType`, and is a prerequisite for the remaining nanosecond timestamp work.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. All registration is gated by the internal flag 
`spark.sql.types.framework.enabled`. When the flag is `false`, behavior is 
identical to the existing legacy paths.
   
   ### How was this patch tested?
   
   Added `TimestampNanosTypeOpsSuite`, covering NTZ and LTZ for p in {7, 8, 9}:
   - `TypeOps` / `TypeApiOps` registration when the framework is enabled.
   - `PhysicalDataType`, `Literal.default` value, and codegen Java class.
   - `InternalRow` and `SpecificInternalRow` set/read roundtrips, including the 
dedicated `MutableTimestampNanos` holder.
   - `getEncoder` reports `UNSUPPORTED_DATA_TYPE_FOR_ENCODER`.
   - `toSQLValue` uses the NTZ/LTZ literal prefix.
   - Framework-disabled fallback produces identical results.
   
   ```
   build/sbt 'catalyst/testOnly 
org.apache.spark.sql.catalyst.types.ops.TimestampNanosTypeOpsSuite'
   ```
   
   All 7 tests pass. `catalyst`/`sql-api` scalastyle are clean.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Cursor


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to