MaxGekk opened a new pull request, #56622:
URL: https://github.com/apache/spark/pull/56622

   ### What changes were proposed in this pull request?
   
   This PR extends the fractional-seconds precision of the `TIME` data type 
from a maximum of 6 (microseconds) to 9 (nanoseconds), so `TIME(p)` accepts `0 
<= p <= 9`. The internal storage is already nanoseconds-since-midnight (`Long`, 
`TimeType.NANOS_PRECISION = 9` already exists), so this lifts the cap and 
closes the two remaining micros-only code paths:
   
   - `TimeType.MAX_PRECISION` is raised from 6 to 9 (`DEFAULT_PRECISION` stays 
6). The `UNSUPPORTED_TIME_PRECISION` error message and related 
scaladoc/comments are updated to `[0, 9]`.
   - `SparkDateTimeUtils.stringToTime` now keeps the sub-microsecond digits 
(7-9), mirroring the timestamp parser, and `CAST(<string> AS TIME(p))` 
truncates the parsed value to the target precision (interpreted and codegen 
paths).
   - `CurrentTime` accepts precisions up to `MAX_PRECISION`.
   - Parquet I/O emits/reads `TIME(NANOS)` for precision 7..9 (and keeps 
`TIME(MICROS)` for 0..6) across `TimeTypeParquetOps`, `ParquetSchemaConverter`, 
the vectorized `ParquetVectorUpdaterFactory`, and the legacy row/write 
fallbacks. ORC and Avro already store the raw nanosecond `Long` with the 
catalyst type name preserved, so they round-trip 7..9 without production 
changes.
   
   Casts that were already nanosecond-aware (`TIME(p1) -> TIME(p2)`, `TIME -> 
DECIMAL`, `TIME -> integral`, `TIME -> STRING`) work for 7..9 once the cap is 
lifted.
   
   Out of scope (tracked separately): SQL golden files (SPARK-57563), casts 
to/from TIMESTAMP types (SPARK-57552 / SPARK-57554), and `TIME WITH TIME ZONE` 
(SPARK-51162).
   
   ### Why are the changes needed?
   
   ANSI SQL (ISO/IEC 9075-2, 6.1 `<data type>`) makes the maximum `<time 
precision>` implementation-defined with the sole constraint that it is not less 
than 6, and requires the maximum of `<time precision>` and `<timestamp 
precision>` to be the same implementation-defined value. Spark already supports 
nanosecond timestamps, so to stay ANSI-consistent `TIME` must reach precision 9 
in lockstep.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. `TIME(7)`, `TIME(8)`, and `TIME(9)` can now be declared, parsed, used 
as literals, and round-tripped through Parquet/ORC/Avro. Previously these 
precisions raised `UNSUPPORTED_TIME_PRECISION`. The default precision of `TIME` 
is unchanged (6).
   
   ### How was this patch tested?
   
   Extended existing TIME suites to cover precision 7..9 
(`DataTypeParserSuite`, `DataTypeSuite`, `TimeExpressionsSuite`, 
`CastWithAnsiOn/OffSuite`, `TimeFormatterSuite`, `DateTimeUtilsSuite`, 
CSV/JSON/XML expression and function suites, `TimeTypeParquetOpsSuite`, 
`ParquetIOSuite`, `OrcQuerySuite`, `AvroSuite`/`AvroFunctionsSuite`, 
`PartitionedWriteSuite`, `RowJsonSuite`, `SparkConnectPlannerSuite`) and added 
nanosecond Parquet read and round-trip tests (covering both the vectorized and 
non-vectorized readers). Ran the affected suites locally; all pass. 
`dev/scalastyle` and scalafmt are clean.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Cursor


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to