MaxGekk commented on code in PR #56637:
URL: https://github.com/apache/spark/pull/56637#discussion_r3455498270
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala:
##########
@@ -1231,34 +1231,76 @@ case class DateFormatClass(left: Expression, right:
Expression, timeZoneId: Opti
def this(left: Expression, right: Expression) = this(left, right, None)
override def inputTypes: Seq[AbstractDataType] =
- Seq(TimestampType, StringTypeWithCollation(supportsTrimCollation = true))
+ Seq(TypeCollection(TimestampType, AnyTimeType),
Review Comment:
Following up on my own earlier verification of `SimplifyDateTimeConversions`
— I cleared it in the last two rounds, but that was incomplete. Accepting TIME
here newly activates **case 1** of that rule
(`sql/catalyst/.../optimizer/expressions.scala:1208`), which rewrites
`date_format(to_timestamp(date_format(X, p), p), p)` → `date_format(X, p)`. The
rule keeps the inner `date_format`, and its child `X` (the `_` in `e @
DateFormatClass(_, pattern, timeZoneId)`) is unconstrained — so with this
change it can be TIME. My prior reasoning only checked the *outer* position
(always `TimestampType` — correct), but the rewrite preserves the *inner* node.
The elimination is a no-op only when `X` is micro-aligned. For a
sub-microsecond `TIME(7/8/9)` value with a sub-micro fractional pattern, the
original truncates through the micros `TimestampType` intermediate while the
rewrite keeps full nanos:
```sql
-- time_col : TIME(9) '12:13:14.123456789'
select date_format(to_timestamp(date_format(time_col, 'HH:mm:ss.SSSSSSSSS'),
'HH:mm:ss.SSSSSSSSS'), 'HH:mm:ss.SSSSSSSSS');
-- unoptimized: 12:13:14.123456000 (truncated through micros)
-- optimized: 12:13:14.123456789 (rule returns inner
date_format(time_col, ...))
```
Reachable in default config (`to_timestamp` lowers to `GetTimestamp(_, _,
TimestampType, ...)` after `ReplaceExpressions`; only
`spark.sql.timeType.enabled` is needed). Minimal fix: guard case 1 to skip when
the inner `DateFormatClass`'s child is `TimeType` (case 2 is fine — its
`DateFormatClass` always sits over a `GetTimestamp`/`TimestampType` child).
Worth a regression test asserting optimized == unoptimized for a `TIME(9)`
round-trip.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]