MaxGekk commented on code in PR #56638:
URL: https://github.com/apache/spark/pull/56638#discussion_r3448927752


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionHelper.scala:
##########
@@ -244,14 +247,54 @@ abstract class TypeCoercionHelper {
     (d1, d2) match {
       case (_, _: TimeType) => None
       case (_: TimeType, _) => None
-      case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) =>
-        Some(TimestampType)
 
-      case (_: TimestampType, _: TimestampNTZType) | (_: TimestampNTZType, _: 
TimestampType) =>
-        Some(TimestampType)
-
-      case (_: TimestampNTZType, _: DateType) | (_: DateType, _: 
TimestampNTZType) =>
-        Some(TimestampNTZType)
+      // The remaining datetime types (DATE and the micro/nanos TIMESTAMP_LTZ 
/ TIMESTAMP_NTZ
+      // families) widen along two independent axes:
+      //   - time-zone family: the result is LTZ if either input is 
LTZ-family, otherwise NTZ. This
+      //     mirrors the microsecond precedent where TIMESTAMP + TIMESTAMP_NTZ 
widens to TIMESTAMP.
+      //     DATE is family-neutral and adopts the family of the other side.
+      //   - precision: the maximum of the two precisions, where the micro 
types and DATE count as 6
+      //     and the nanos types contribute their own precision p in [7, 9].
+      // The (family, precision) pair then maps back to a concrete type: 
precision 6 yields the
+      // micro type, precision in [7, 9] yields the nanos type.
+      //
+      // Note: this common-type resolution is intentionally more permissive 
than the nanosecond
+      // conversion rules in Cast.canUpCast / Cast.canANSIStoreAssign, which 
keep cross-family and
+      // DATE <-> nanos casts explicit-CAST-only while the nanos types are 
unreleased (SPARK-57323
+      // etc.). Coercion here mirrors the microsecond precedent so that UNION 
/ CASE / coalesce /
+      // IN / comparison resolve a common type the same way they do for the 
micro families; the
+      // stricter explicit-only stance is deliberately scoped to up-cast and 
store assignment, not
+      // to common-type resolution.
+      case _ =>
+        def isLtz(d: DatetimeType): Boolean =
+          d.isInstanceOf[TimestampType] || 
d.isInstanceOf[TimestampLTZNanosType]
+        def isNtz(d: DatetimeType): Boolean =
+          d.isInstanceOf[TimestampNTZType] || 
d.isInstanceOf[TimestampNTZNanosType]
+        def precisionOf(d: DatetimeType): Int = d match {
+          case t: TimestampLTZNanosType => t.precision
+          case t: TimestampNTZNanosType => t.precision
+          case _ => 6 // DateType / TimestampType / TimestampNTZType
+        }
+        // Beyond TimeType (handled above), the only datetime types are DATE 
and the micro/nanos
+        // timestamp families. Guard so that a future DatetimeType subtype 
fails fast here instead
+        // of being silently mis-widened (treated as a family-neutral 
precision-6 type and folded
+        // into DATE) when it should be wired in explicitly.
+        def isWidenable(d: DatetimeType): Boolean =
+          isLtz(d) || isNtz(d) || d.isInstanceOf[DateType]
+        if (!isWidenable(d1) || !isWidenable(d2)) {
+          throw SparkException.internalError(
+            s"Unexpected datetime types in findWiderDateTimeType: $d1, $d2")
+        } else if (!isLtz(d1) && !isNtz(d1) && !isLtz(d2) && !isNtz(d2)) {
+          // Both sides are DATE; callers short-circuit equal types, so this 
is just defensive.
+          Some(DateType)

Review Comment:
   Confirmed - unreachable via the current callers. `findWiderDateTimeType` is 
only called from `TypeCoercion.findTightestCommonType` and 
`AnsiTypeCoercion.findTightestCommonType`, both of which short-circuit `case 
(t1, t2) if t1 == t2 => Some(t1)` before reaching here, and `DateType` is a 
singleton case object, so a `DATE`/`DATE` pair never gets this far. The 
`Some(DateType)` arm is just a defensive, semantically-correct default (hence 
the comment). If you'd prefer no dead code, I can fold it into the 
`internalError` guard instead - happy to go either way.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionHelper.scala:
##########
@@ -244,14 +247,54 @@ abstract class TypeCoercionHelper {
     (d1, d2) match {
       case (_, _: TimeType) => None
       case (_: TimeType, _) => None

Review Comment:
   Good eye - `TimeType` is currently the only `AnyTimeType` subtype, so the 
two match the same set today. I'd lean toward keeping `_: TimeType` though, for 
two reasons: (1) these two arms are pre-existing (this PR only adds the `case _ 
=>` arm), so broadening them is orthogonal to nanos widening; and (2) it would 
partly defeat the fail-fast guard just below - the intent is that a genuinely 
new `DatetimeType` (a future `AnyTimeType` subtype included) trips 
`internalError` and gets triaged explicitly, rather than silently resolving to 
`None`, which may not be the right common-type semantics for it (it could 
warrant its own precision-widening rule, like the timestamp families do). Happy 
to switch if you feel strongly, but my preference is the minimal pre-existing 
form.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to