stevomitric commented on code in PR #56638:
URL: https://github.com/apache/spark/pull/56638#discussion_r3449080369
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionHelper.scala:
##########
@@ -244,14 +247,54 @@ abstract class TypeCoercionHelper {
(d1, d2) match {
case (_, _: TimeType) => None
case (_: TimeType, _) => None
- case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) =>
- Some(TimestampType)
- case (_: TimestampType, _: TimestampNTZType) | (_: TimestampNTZType, _:
TimestampType) =>
- Some(TimestampType)
-
- case (_: TimestampNTZType, _: DateType) | (_: DateType, _:
TimestampNTZType) =>
- Some(TimestampNTZType)
+ // The remaining datetime types (DATE and the micro/nanos TIMESTAMP_LTZ
/ TIMESTAMP_NTZ
+ // families) widen along two independent axes:
+ // - time-zone family: the result is LTZ if either input is
LTZ-family, otherwise NTZ. This
+ // mirrors the microsecond precedent where TIMESTAMP + TIMESTAMP_NTZ
widens to TIMESTAMP.
+ // DATE is family-neutral and adopts the family of the other side.
+ // - precision: the maximum of the two precisions, where the micro
types and DATE count as 6
+ // and the nanos types contribute their own precision p in [7, 9].
+ // The (family, precision) pair then maps back to a concrete type:
precision 6 yields the
+ // micro type, precision in [7, 9] yields the nanos type.
+ //
+ // Note: this common-type resolution is intentionally more permissive
than the nanosecond
+ // conversion rules in Cast.canUpCast / Cast.canANSIStoreAssign, which
keep cross-family and
+ // DATE <-> nanos casts explicit-CAST-only while the nanos types are
unreleased (SPARK-57323
+ // etc.). Coercion here mirrors the microsecond precedent so that UNION
/ CASE / coalesce /
+ // IN / comparison resolve a common type the same way they do for the
micro families; the
+ // stricter explicit-only stance is deliberately scoped to up-cast and
store assignment, not
+ // to common-type resolution.
+ case _ =>
+ def isLtz(d: DatetimeType): Boolean =
+ d.isInstanceOf[TimestampType] ||
d.isInstanceOf[TimestampLTZNanosType]
+ def isNtz(d: DatetimeType): Boolean =
+ d.isInstanceOf[TimestampNTZType] ||
d.isInstanceOf[TimestampNTZNanosType]
+ def precisionOf(d: DatetimeType): Int = d match {
+ case t: TimestampLTZNanosType => t.precision
+ case t: TimestampNTZNanosType => t.precision
+ case _ => 6 // DateType / TimestampType / TimestampNTZType
Review Comment:
Minor: 6 (micro precision) appears here and again at lines 293/295 (p <= 6).
A local val MicrosPrecision = 6 would centralize it and self-document the [7,9]
boundary.
##########
sql/core/src/test/resources/sql-tests/results/timestamp-ltz-nanos.sql.out:
##########
@@ -854,3 +854,140 @@ SELECT unix_nanos(NULL :: timestamp_ltz(9))
struct<unix_nanos(CAST(NULL AS TIMESTAMP_LTZ(9))):decimal(21,0)>
-- !query output
NULL
+
+
+-- !query
+SELECT typeof(c), c FROM (
+ SELECT TIMESTAMP_LTZ '0001-01-01 00:00:00' AS c
+ UNION ALL SELECT TIMESTAMP_LTZ '9999-12-31 23:59:59.999999999') ORDER BY c
+-- !query schema
+struct<typeof(c):string,c:timestamp_ltz(9)>
+-- !query output
+timestamp_ltz(9) 0001-01-01 00:00:00
+timestamp_ltz(9) 9999-12-31 23:59:59.999999999
+
+
+-- !query
+SELECT typeof(c), c FROM (
+ SELECT '1582-10-04 12:30:45.1234567' :: timestamp_ltz(7) AS c
+ UNION ALL SELECT '1582-10-15 23:59:59.123456789' :: timestamp_ltz(9))
ORDER BY c
+-- !query schema
+struct<typeof(c):string,c:timestamp_ltz(9)>
+-- !query output
+timestamp_ltz(9) 1582-10-04 12:30:45.1234567
+timestamp_ltz(9) 1582-10-15 23:59:59.123456789
+
+
+-- !query
+SELECT typeof(v), v FROM (SELECT coalesce(
+ '1969-12-31 23:59:59.0000001 Asia/Kolkata' :: timestamp_ltz(7),
+ '1969-12-31 23:59:59.999999999 UTC' :: timestamp_ltz(9)) AS v)
+-- !query schema
+struct<typeof(v):string,v:timestamp_ltz(9)>
+-- !query output
+timestamp_ltz(9) 1969-12-31 10:29:59.0000001
+
+
+-- !query
+SELECT typeof(v), v FROM (SELECT CASE WHEN true
+ THEN TIMESTAMP_LTZ '2026-06-21 10:16:30 Asia/Kathmandu'
+ ELSE '2026-06-21 10:16:30.987654321 UTC' :: timestamp_ltz(9) END AS v)
+-- !query schema
+struct<typeof(v):string,v:timestamp_ltz(9)>
+-- !query output
+timestamp_ltz(9) 2026-06-20 21:31:30
+
+
+-- !query
+SELECT typeof(v), v FROM (SELECT coalesce(
+ DATE '0001-01-01', '2020-01-01 00:00:00.12345678' :: timestamp_ltz(8)) AS
v)
+-- !query schema
+struct<typeof(v):string,v:timestamp_ltz(8)>
+-- !query output
+timestamp_ltz(8) 0001-01-01 00:00:00
+
+
+-- !query
+SELECT typeof(greatest(TIMESTAMP_LTZ '0001-01-01 00:00:00',
+ '9999-12-31 23:59:59.999999999' :: timestamp_ltz(9)))
+-- !query schema
+struct<typeof(greatest(TIMESTAMP '0001-01-01 00:00:00', CAST(9999-12-31
23:59:59.999999999 AS TIMESTAMP_LTZ(9)))):string>
+-- !query output
+timestamp_ltz(9)
+
+
+-- !query
+SELECT greatest(TIMESTAMP_LTZ '1500-03-01 12:00:00',
+ '1582-10-15 00:00:00.123456789' :: timestamp_ltz(9),
+ TIMESTAMP_LTZ '2026-06-21 10:16:30.5')
+-- !query schema
+struct<greatest(TIMESTAMP '1500-03-01 12:00:00', CAST(1582-10-15
00:00:00.123456789 AS TIMESTAMP_LTZ(9)), TIMESTAMP '2026-06-21
10:16:30.5'):timestamp_ltz(9)>
+-- !query output
+2026-06-21 10:16:30.5
+
+
+-- !query
+SELECT least('1970-01-01 00:00:00.0000001' :: timestamp_ltz(7),
+ '1969-12-31 23:59:59.999999999' :: timestamp_ltz(9))
+-- !query schema
+struct<least(CAST(1970-01-01 00:00:00.0000001 AS TIMESTAMP_LTZ(7)),
CAST(1969-12-31 23:59:59.999999999 AS TIMESTAMP_LTZ(9))):timestamp_ltz(9)>
+-- !query output
+1969-12-31 23:59:59.999999999
+
+
+-- !query
+SELECT array('0001-01-01 00:00:00.0000001' :: timestamp_ltz(7),
+ TIMESTAMP_LTZ '2026-06-21 10:16:30 Asia/Kolkata',
+ '9999-12-31 23:59:59.999999999' :: timestamp_ltz(9))
+-- !query schema
+struct<array(CAST(0001-01-01 00:00:00.0000001 AS TIMESTAMP_LTZ(7)), TIMESTAMP
'2026-06-20 21:46:30', CAST(9999-12-31 23:59:59.999999999 AS
TIMESTAMP_LTZ(9))):array<timestamp_ltz(9)>>
+-- !query output
+[0001-01-01 00:00:00.0000001,2026-06-20 21:46:30,9999-12-31 23:59:59.999999999]
+
+
+-- !query
+SELECT typeof(array(TIMESTAMP_LTZ '9999-12-31 23:59:59',
+ '0001-01-01 00:00:00.000000001' :: timestamp_ltz(9)))
+-- !query schema
+struct<typeof(array(TIMESTAMP '9999-12-31 23:59:59', CAST(0001-01-01
00:00:00.000000001 AS TIMESTAMP_LTZ(9)))):string>
+-- !query output
+array<timestamp_ltz(9)>
+
+
+-- !query
+SELECT map('min', '0001-01-01 00:00:00.000000001' :: timestamp_ltz(9),
+ 'max', TIMESTAMP_LTZ '9999-12-31 23:59:59.999999')
+-- !query schema
+struct<map(min, CAST(0001-01-01 00:00:00.000000001 AS TIMESTAMP_LTZ(9)), max,
TIMESTAMP '9999-12-31 23:59:59.999999'):map<string,timestamp_ltz(9)>>
+-- !query output
+{"max":9999-12-31 23:59:59.999999,"min":0001-01-01 00:00:00.000000001}
+
+
+-- !query
+SELECT typeof(c) FROM (
+ SELECT TIMESTAMP_NTZ '1582-10-15 00:00:00' AS c
Review Comment:
The mixed-family UNION/coalesce/CASE goldens assert typeof(...) only —
reasonable, since the value is session-zone dependent, but it means the one
place this PR introduces an implicit cross-family conversion never has its
zone-shifted value locked. If the inserted Cast's sessionLocalTimeZone wiring
ever regressed, these would still pass. Could you add one mixed-family case
pinned to a deterministic source zone (like the same-family Asia/Kolkata
coalesce already does), or is type-only sufficient for the preview?
##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercionSuite.scala:
##########
@@ -219,6 +228,25 @@ class AnsiTypeCoercionSuite extends TypeCoercionSuiteBase {
widenTest(IntegerType, TimestampType, None)
widenTest(StringType, TimestampType, None)
+ // Nanosecond-precision timestamp types (SPARK-57454).
Review Comment:
This ANSI block is a strict subset of the non-ANSI one in TypeCoercionSuite
(~lines 656–674): it's missing TimestampNTZNanosType(9) + TimeType(6) → None,
the TimestampLTZNanosType(7) + TimestampNTZType → TimestampLTZNanosType(7)
mixed-family-with-micro cell, and the nanos(8) + nanos(8) self-pair.
findWiderDateTimeType is shared so risk is low, but the asymmetry looks
accidental and weakens "both ANSI modes" for those cells — mirror the three,
or add a comment if intentional?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]