Re: [PR] [SPARK-57454][SQL] Add type coercion and widening rules for nanosecond-precision timestamp types [spark]

via GitHub Sun, 21 Jun 2026 14:21:40 -0700


stevomitric commented on code in PR #56638:
URL: https://github.com/apache/spark/pull/56638#discussion_r3449080369



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionHelper.scala:
##########
@@ -244,14 +247,54 @@ abstract class TypeCoercionHelper {
     (d1, d2) match {
       case (_, _: TimeType) => None
       case (_: TimeType, _) => None
-      case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) =>
-        Some(TimestampType)
 
-      case (_: TimestampType, _: TimestampNTZType) | (_: TimestampNTZType, _: 
TimestampType) =>
-        Some(TimestampType)
-
-      case (_: TimestampNTZType, _: DateType) | (_: DateType, _: 
TimestampNTZType) =>
-        Some(TimestampNTZType)
+      // The remaining datetime types (DATE and the micro/nanos TIMESTAMP_LTZ 
/ TIMESTAMP_NTZ
+      // families) widen along two independent axes:
+      //   - time-zone family: the result is LTZ if either input is 
LTZ-family, otherwise NTZ. This
+      //     mirrors the microsecond precedent where TIMESTAMP + TIMESTAMP_NTZ 
widens to TIMESTAMP.
+      //     DATE is family-neutral and adopts the family of the other side.
+      //   - precision: the maximum of the two precisions, where the micro 
types and DATE count as 6
+      //     and the nanos types contribute their own precision p in [7, 9].
+      // The (family, precision) pair then maps back to a concrete type: 
precision 6 yields the
+      // micro type, precision in [7, 9] yields the nanos type.
+      //
+      // Note: this common-type resolution is intentionally more permissive 
than the nanosecond
+      // conversion rules in Cast.canUpCast / Cast.canANSIStoreAssign, which 
keep cross-family and
+      // DATE <-> nanos casts explicit-CAST-only while the nanos types are 
unreleased (SPARK-57323
+      // etc.). Coercion here mirrors the microsecond precedent so that UNION 
/ CASE / coalesce /
+      // IN / comparison resolve a common type the same way they do for the 
micro families; the
+      // stricter explicit-only stance is deliberately scoped to up-cast and 
store assignment, not
+      // to common-type resolution.
+      case _ =>
+        def isLtz(d: DatetimeType): Boolean =
+          d.isInstanceOf[TimestampType] || 
d.isInstanceOf[TimestampLTZNanosType]
+        def isNtz(d: DatetimeType): Boolean =
+          d.isInstanceOf[TimestampNTZType] || 
d.isInstanceOf[TimestampNTZNanosType]
+        def precisionOf(d: DatetimeType): Int = d match {
+          case t: TimestampLTZNanosType => t.precision
+          case t: TimestampNTZNanosType => t.precision
+          case _ => 6 // DateType / TimestampType / TimestampNTZType

Review Comment:
   Minor: 6 (micro precision) appears here and again at lines 293/295 (p <= 6). 
A local val MicrosPrecision = 6 would centralize it and self-document the [7,9] 
boundary.



##########
sql/core/src/test/resources/sql-tests/results/timestamp-ltz-nanos.sql.out:
##########
@@ -854,3 +854,140 @@ SELECT unix_nanos(NULL :: timestamp_ltz(9))
 struct<unix_nanos(CAST(NULL AS TIMESTAMP_LTZ(9))):decimal(21,0)>
 -- !query output
 NULL
+
+
+-- !query
+SELECT typeof(c), c FROM (
+    SELECT TIMESTAMP_LTZ '0001-01-01 00:00:00' AS c
+    UNION ALL SELECT TIMESTAMP_LTZ '9999-12-31 23:59:59.999999999') ORDER BY c
+-- !query schema
+struct<typeof(c):string,c:timestamp_ltz(9)>
+-- !query output
+timestamp_ltz(9)       0001-01-01 00:00:00
+timestamp_ltz(9)       9999-12-31 23:59:59.999999999
+
+
+-- !query
+SELECT typeof(c), c FROM (
+    SELECT '1582-10-04 12:30:45.1234567' :: timestamp_ltz(7) AS c
+    UNION ALL SELECT '1582-10-15 23:59:59.123456789' :: timestamp_ltz(9)) 
ORDER BY c
+-- !query schema
+struct<typeof(c):string,c:timestamp_ltz(9)>
+-- !query output
+timestamp_ltz(9)       1582-10-04 12:30:45.1234567
+timestamp_ltz(9)       1582-10-15 23:59:59.123456789
+
+
+-- !query
+SELECT typeof(v), v FROM (SELECT coalesce(
+    '1969-12-31 23:59:59.0000001 Asia/Kolkata' :: timestamp_ltz(7),
+    '1969-12-31 23:59:59.999999999 UTC' :: timestamp_ltz(9)) AS v)
+-- !query schema
+struct<typeof(v):string,v:timestamp_ltz(9)>
+-- !query output
+timestamp_ltz(9)       1969-12-31 10:29:59.0000001
+
+
+-- !query
+SELECT typeof(v), v FROM (SELECT CASE WHEN true
+    THEN TIMESTAMP_LTZ '2026-06-21 10:16:30 Asia/Kathmandu'
+    ELSE '2026-06-21 10:16:30.987654321 UTC' :: timestamp_ltz(9) END AS v)
+-- !query schema
+struct<typeof(v):string,v:timestamp_ltz(9)>
+-- !query output
+timestamp_ltz(9)       2026-06-20 21:31:30
+
+
+-- !query
+SELECT typeof(v), v FROM (SELECT coalesce(
+    DATE '0001-01-01', '2020-01-01 00:00:00.12345678' :: timestamp_ltz(8)) AS 
v)
+-- !query schema
+struct<typeof(v):string,v:timestamp_ltz(8)>
+-- !query output
+timestamp_ltz(8)       0001-01-01 00:00:00
+
+
+-- !query
+SELECT typeof(greatest(TIMESTAMP_LTZ '0001-01-01 00:00:00',
+    '9999-12-31 23:59:59.999999999' :: timestamp_ltz(9)))
+-- !query schema
+struct<typeof(greatest(TIMESTAMP '0001-01-01 00:00:00', CAST(9999-12-31 
23:59:59.999999999 AS TIMESTAMP_LTZ(9)))):string>
+-- !query output
+timestamp_ltz(9)
+
+
+-- !query
+SELECT greatest(TIMESTAMP_LTZ '1500-03-01 12:00:00',
+    '1582-10-15 00:00:00.123456789' :: timestamp_ltz(9),
+    TIMESTAMP_LTZ '2026-06-21 10:16:30.5')
+-- !query schema
+struct<greatest(TIMESTAMP '1500-03-01 12:00:00', CAST(1582-10-15 
00:00:00.123456789 AS TIMESTAMP_LTZ(9)), TIMESTAMP '2026-06-21 
10:16:30.5'):timestamp_ltz(9)>
+-- !query output
+2026-06-21 10:16:30.5
+
+
+-- !query
+SELECT least('1970-01-01 00:00:00.0000001' :: timestamp_ltz(7),
+    '1969-12-31 23:59:59.999999999' :: timestamp_ltz(9))
+-- !query schema
+struct<least(CAST(1970-01-01 00:00:00.0000001 AS TIMESTAMP_LTZ(7)), 
CAST(1969-12-31 23:59:59.999999999 AS TIMESTAMP_LTZ(9))):timestamp_ltz(9)>
+-- !query output
+1969-12-31 23:59:59.999999999
+
+
+-- !query
+SELECT array('0001-01-01 00:00:00.0000001' :: timestamp_ltz(7),
+    TIMESTAMP_LTZ '2026-06-21 10:16:30 Asia/Kolkata',
+    '9999-12-31 23:59:59.999999999' :: timestamp_ltz(9))
+-- !query schema
+struct<array(CAST(0001-01-01 00:00:00.0000001 AS TIMESTAMP_LTZ(7)), TIMESTAMP 
'2026-06-20 21:46:30', CAST(9999-12-31 23:59:59.999999999 AS 
TIMESTAMP_LTZ(9))):array<timestamp_ltz(9)>>
+-- !query output
+[0001-01-01 00:00:00.0000001,2026-06-20 21:46:30,9999-12-31 23:59:59.999999999]
+
+
+-- !query
+SELECT typeof(array(TIMESTAMP_LTZ '9999-12-31 23:59:59',
+    '0001-01-01 00:00:00.000000001' :: timestamp_ltz(9)))
+-- !query schema
+struct<typeof(array(TIMESTAMP '9999-12-31 23:59:59', CAST(0001-01-01 
00:00:00.000000001 AS TIMESTAMP_LTZ(9)))):string>
+-- !query output
+array<timestamp_ltz(9)>
+
+
+-- !query
+SELECT map('min', '0001-01-01 00:00:00.000000001' :: timestamp_ltz(9),
+    'max', TIMESTAMP_LTZ '9999-12-31 23:59:59.999999')
+-- !query schema
+struct<map(min, CAST(0001-01-01 00:00:00.000000001 AS TIMESTAMP_LTZ(9)), max, 
TIMESTAMP '9999-12-31 23:59:59.999999'):map<string,timestamp_ltz(9)>>
+-- !query output
+{"max":9999-12-31 23:59:59.999999,"min":0001-01-01 00:00:00.000000001}
+
+
+-- !query
+SELECT typeof(c) FROM (
+    SELECT TIMESTAMP_NTZ '1582-10-15 00:00:00' AS c

Review Comment:
   The mixed-family UNION/coalesce/CASE goldens assert typeof(...) only — 
reasonable, since the value is session-zone dependent, but it means the one 
place this PR introduces an implicit cross-family conversion never has its 
zone-shifted value locked. If the inserted Cast's sessionLocalTimeZone wiring 
ever regressed, these would still pass. Could you add one mixed-family case 
pinned to a deterministic source  zone (like the same-family Asia/Kolkata 
coalesce already does), or is type-only sufficient for the preview?



##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercionSuite.scala:
##########
@@ -219,6 +228,25 @@ class AnsiTypeCoercionSuite extends TypeCoercionSuiteBase {
     widenTest(IntegerType, TimestampType, None)
     widenTest(StringType, TimestampType, None)
 
+    // Nanosecond-precision timestamp types (SPARK-57454).

Review Comment:
   This ANSI block is a strict subset of the non-ANSI one in TypeCoercionSuite 
(~lines 656–674): it's missing TimestampNTZNanosType(9) + TimeType(6) → None, 
the TimestampLTZNanosType(7) + TimestampNTZType → TimestampLTZNanosType(7) 
mixed-family-with-micro cell, and the nanos(8) + nanos(8) self-pair. 
findWiderDateTimeType is shared so risk is low, but the asymmetry looks 
accidental and weakens "both ANSI  modes" for those cells — mirror the three, 
or add a comment if intentional?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-57454][SQL] Add type coercion and widening rules for nanosecond-precision timestamp types [spark]

Reply via email to