MaxGekk commented on code in PR #56638:
URL: https://github.com/apache/spark/pull/56638#discussion_r3448927752
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionHelper.scala:
##########
@@ -244,14 +247,54 @@ abstract class TypeCoercionHelper {
(d1, d2) match {
case (_, _: TimeType) => None
case (_: TimeType, _) => None
- case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) =>
- Some(TimestampType)
- case (_: TimestampType, _: TimestampNTZType) | (_: TimestampNTZType, _:
TimestampType) =>
- Some(TimestampType)
-
- case (_: TimestampNTZType, _: DateType) | (_: DateType, _:
TimestampNTZType) =>
- Some(TimestampNTZType)
+ // The remaining datetime types (DATE and the micro/nanos TIMESTAMP_LTZ
/ TIMESTAMP_NTZ
+ // families) widen along two independent axes:
+ // - time-zone family: the result is LTZ if either input is
LTZ-family, otherwise NTZ. This
+ // mirrors the microsecond precedent where TIMESTAMP + TIMESTAMP_NTZ
widens to TIMESTAMP.
+ // DATE is family-neutral and adopts the family of the other side.
+ // - precision: the maximum of the two precisions, where the micro
types and DATE count as 6
+ // and the nanos types contribute their own precision p in [7, 9].
+ // The (family, precision) pair then maps back to a concrete type:
precision 6 yields the
+ // micro type, precision in [7, 9] yields the nanos type.
+ //
+ // Note: this common-type resolution is intentionally more permissive
than the nanosecond
+ // conversion rules in Cast.canUpCast / Cast.canANSIStoreAssign, which
keep cross-family and
+ // DATE <-> nanos casts explicit-CAST-only while the nanos types are
unreleased (SPARK-57323
+ // etc.). Coercion here mirrors the microsecond precedent so that UNION
/ CASE / coalesce /
+ // IN / comparison resolve a common type the same way they do for the
micro families; the
+ // stricter explicit-only stance is deliberately scoped to up-cast and
store assignment, not
+ // to common-type resolution.
+ case _ =>
+ def isLtz(d: DatetimeType): Boolean =
+ d.isInstanceOf[TimestampType] ||
d.isInstanceOf[TimestampLTZNanosType]
+ def isNtz(d: DatetimeType): Boolean =
+ d.isInstanceOf[TimestampNTZType] ||
d.isInstanceOf[TimestampNTZNanosType]
+ def precisionOf(d: DatetimeType): Int = d match {
+ case t: TimestampLTZNanosType => t.precision
+ case t: TimestampNTZNanosType => t.precision
+ case _ => 6 // DateType / TimestampType / TimestampNTZType
+ }
+ // Beyond TimeType (handled above), the only datetime types are DATE
and the micro/nanos
+ // timestamp families. Guard so that a future DatetimeType subtype
fails fast here instead
+ // of being silently mis-widened (treated as a family-neutral
precision-6 type and folded
+ // into DATE) when it should be wired in explicitly.
+ def isWidenable(d: DatetimeType): Boolean =
+ isLtz(d) || isNtz(d) || d.isInstanceOf[DateType]
+ if (!isWidenable(d1) || !isWidenable(d2)) {
+ throw SparkException.internalError(
+ s"Unexpected datetime types in findWiderDateTimeType: $d1, $d2")
+ } else if (!isLtz(d1) && !isNtz(d1) && !isLtz(d2) && !isNtz(d2)) {
+ // Both sides are DATE; callers short-circuit equal types, so this
is just defensive.
+ Some(DateType)
Review Comment:
Confirmed - unreachable via the current callers. `findWiderDateTimeType` is
only called from `TypeCoercion.findTightestCommonType` and
`AnsiTypeCoercion.findTightestCommonType`, both of which short-circuit `case
(t1, t2) if t1 == t2 => Some(t1)` before reaching here, and `DateType` is a
singleton case object, so a `DATE`/`DATE` pair never gets this far. The
`Some(DateType)` arm is just a defensive, semantically-correct default (hence
the comment). If you'd prefer no dead code, I can fold it into the
`internalError` guard instead - happy to go either way.
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionHelper.scala:
##########
@@ -244,14 +247,54 @@ abstract class TypeCoercionHelper {
(d1, d2) match {
case (_, _: TimeType) => None
case (_: TimeType, _) => None
Review Comment:
Good eye - `TimeType` is currently the only `AnyTimeType` subtype, so the
two match the same set today. I'd lean toward keeping `_: TimeType` though, for
two reasons: (1) these two arms are pre-existing (this PR only adds the `case _
=>` arm), so broadening them is orthogonal to nanos widening; and (2) it would
partly defeat the fail-fast guard just below - the intent is that a genuinely
new `DatetimeType` (a future `AnyTimeType` subtype included) trips
`internalError` and gets triaged explicitly, rather than silently resolving to
`None`, which may not be the right common-type semantics for it (it could
warrant its own precision-widening rule, like the timestamp families do). Happy
to switch if you feel strongly, but my preference is the minimal pre-existing
form.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]