MaxGekk commented on code in PR #56637:
URL: https://github.com/apache/spark/pull/56637#discussion_r3455498270


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala:
##########
@@ -1231,34 +1231,76 @@ case class DateFormatClass(left: Expression, right: 
Expression, timeZoneId: Opti
   def this(left: Expression, right: Expression) = this(left, right, None)
 
   override def inputTypes: Seq[AbstractDataType] =
-    Seq(TimestampType, StringTypeWithCollation(supportsTrimCollation = true))
+    Seq(TypeCollection(TimestampType, AnyTimeType),

Review Comment:
   Following up on my own earlier verification of `SimplifyDateTimeConversions` 
— I cleared it in the last two rounds, but that was incomplete. Accepting TIME 
here newly activates **case 1** of that rule 
(`sql/catalyst/.../optimizer/expressions.scala:1208`), which rewrites 
`date_format(to_timestamp(date_format(X, p), p), p)` → `date_format(X, p)`. The 
rule keeps the inner `date_format`, and its child `X` (the `_` in `e @ 
DateFormatClass(_, pattern, timeZoneId)`) is unconstrained — so with this 
change it can be TIME. My prior reasoning only checked the *outer* position 
(always `TimestampType` — correct), but the rewrite preserves the *inner* node.
   
   The elimination is a no-op only when `X` is micro-aligned. For a 
sub-microsecond `TIME(7/8/9)` value with a sub-micro fractional pattern, the 
original truncates through the micros `TimestampType` intermediate while the 
rewrite keeps full nanos:
   
   ```sql
   -- time_col : TIME(9) '12:13:14.123456789'
   select date_format(to_timestamp(date_format(time_col, 'HH:mm:ss.SSSSSSSSS'),
                                   'HH:mm:ss.SSSSSSSSS'), 'HH:mm:ss.SSSSSSSSS');
   -- unoptimized: 12:13:14.123456000   (truncated through micros)
   -- optimized:   12:13:14.123456789   (rule returns inner 
date_format(time_col, ...))
   ```
   
   Reachable in default config (`to_timestamp` lowers to `GetTimestamp(_, _, 
TimestampType, ...)` after `ReplaceExpressions`; only 
`spark.sql.timeType.enabled` is needed). Minimal fix: guard case 1 to skip when 
the inner `DateFormatClass`'s child is `TimeType` (case 2 is fine — its 
`DateFormatClass` always sits over a `GetTimestamp`/`TimestampType` child). 
Worth a regression test asserting optimized == unoptimized for a `TIME(9)` 
round-trip.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to