MaxGekk opened a new pull request, #56269:
URL: https://github.com/apache/spark/pull/56269

   ### What changes were proposed in this pull request?
   
   This PR makes all built-in file datasources and JDBC explicitly **reject** 
the nanosecond-capable timestamp types `TimestampNTZNanosType` and 
`TimestampLTZNanosType` on both the read and write paths, raising the existing 
`UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE` error.
   
   An explicit arm is added before the `case _: AtomicType => true` catch-all 
in each datasource's `supportDataType` / `supportsDataType`:
   
   ```scala
   // Nanosecond-capable timestamps are not yet supported by this datasource.
   case _: TimestampNTZNanosType | _: TimestampLTZNanosType => false
   ```
   
   Sources changed:
   - V1 `FileFormat.supportDataType`: `ParquetFileFormat`, `OrcFileFormat` 
(native), `JsonFileFormat`, `XmlFileFormat`, and the private 
`CSVFileFormat.supportDataType(dataType, allowVariant)` (covers both 
`supportDataType` and `supportReadDataType`).
   - `AvroUtils.supportsDataType` (single edit covering V1 `AvroFileFormat` and 
V2 `AvroTable`).
   - `sql/hive` `OrcFileFormat.supportDataType` (Hive ORC serde).
   - V2 `FileTable.supportsDataType`: `ParquetTable`, `OrcTable`, `JsonTable`, 
`CSVTable`.
   
   No code change was needed for:
   - Text (string-only, already rejects nanos).
   - JDBC read (`getCatalystType` maps JDBC `TIMESTAMP`/`TIME` to microsecond 
types only).
   - JDBC write (already rejected via the `CreatableRelationProvider` type 
whitelist and `JdbcUtils.getCommonJDBCType` returning `None`); a test is added 
to lock the behavior.
   - XML V2: there is no `XmlTable`; XML is V1-only, so the `XmlFileFormat` 
edit covers all XML paths.
   
   The rejection is **unconditional** (independent of the 
`spark.sql.timestampNanosTypes.enabled` preview flag). The flag governs only 
whether the type may exist; these sources have no real nanos read/write support 
yet. As each source adds support later (e.g. Parquet read via SPARK-57102), it 
can carve out its own exception.
   
   This is a sub-task of 
[SPARK-56822](https://issues.apache.org/jira/browse/SPARK-56822) (SPIP: 
Timestamps with nanosecond precision).
   
   ### Why are the changes needed?
   
   When the preview flag `spark.sql.timestampNanosTypes.enabled` is on, the 
nanos types extend `DatetimeType extends AtomicType`, so every file source's 
`case _: AtomicType => true` catch-all silently **accepts** them and then 
misbehaves at read/write time because no real support exists yet. Users get 
confusing downstream failures or silent precision issues instead of a clear, 
actionable error. This PR provides a clear guardrail until real support is 
implemented per source.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, but only for the preview feature gated by 
`spark.sql.timestampNanosTypes.enabled` (disabled by default in production). 
With the flag enabled, writing or reading a `TIMESTAMP_NTZ(p)` / 
`TIMESTAMP_LTZ(p)` (p in [7, 9]) column through Parquet, ORC, Avro, JSON, CSV, 
XML (v1 and v2), and Hive ORC now fails with 
`UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE`, and JDBC write fails with a clear 
error. Previously these were silently accepted and then misbehaved. Behavior 
for all other types is unchanged.
   
   ### How was this patch tested?
   
   Added unit tests:
   - `FileBasedDataSourceSuite`: a new test that, with 
`TIMESTAMP_NANOS_TYPES_ENABLED=true`, iterates over v1 and v2 
(`USE_V1_SOURCE_LIST`) and the built-in formats (parquet, orc, json, csv, xml), 
asserting that both write and read of a `TIMESTAMP_NTZ(9)` / `TIMESTAMP_LTZ(9)` 
column fail with `UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE`. The nanos column is 
built from a typed literal rather than via CAST.
   - `AvroSuite`: an equivalent write+read assertion for Avro.
   - `JDBCWriteSuite`: a new test asserting that writing a nanos column via 
`.write.jdbc(...)` is rejected.
   
   All three suites pass locally, and `sql/core`, `sql/hive`, and 
`connector/avro` test sources compile.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Cursor (Claude Opus 4.8)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to