Re: [PR] [SPARK-57285][SQL] Route nanosecond timestamp cast-to-string through the Types Framework [spark]

via GitHub Mon, 08 Jun 2026 13:55:33 -0700


uros-b commented on code in PR #56355:
URL: https://github.com/apache/spark/pull/56355#discussion_r3376305734



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ToStringBase.scala:
##########
@@ -66,16 +66,14 @@ trait ToStringBase { self: UnaryExpression with 
TimeZoneAwareExpression =>
       case NoConstraint => castToString(from)
     }
 
-  private def castToString(from: DataType): Any => UTF8String = from match {
-    // Nanosecond timestamp string formatting is zone-aware (LTZ renders in 
the session time zone),
-    // so it lives in castToStringDefault alongside the microsecond timestamp 
types rather than the
-    // zone-less Types Framework formatter (SPARK-57256).
-    case _: TimestampNTZNanosType | _: TimestampLTZNanosType => 
castToStringDefault(from)
-    case _ =>
-      TypeApiOps(from)
-        .map(ops => acceptAny[Any](v => ops.formatUTF8(v)))
-        .getOrElse(castToStringDefault(from))
-  }
+  // The Types Framework is the single integration point for framework types' 
cast-to-string, via
+  // the zone-less formatUTF8. The cast's session zone is threaded into the 
lookup so TIMESTAMP_LTZ
+  // nanos renders in it; zone-independent types (TimeType, TIMESTAMP_NTZ 
nanos) ignore it
+  // (SPARK-57285).
+  private def castToString(from: DataType): Any => UTF8String =

Review Comment:
   Not blocking for this particular PR, but more of a performance discussion:
   
   For a nanos timestamp nested in an array/map/struct, castToString(et) is 
invoked inside the per-row element closure, so each row constructs a fresh ops 
instance whose lazy formatter builds a new TimestampFormatter. Previously the 
nested path reused the ToStringBase-level shared timestampFormatter / 
timestampNTZFormatter. This is a (bounded) per-row allocation regression for 
interpreted execution of nested nanos timestamps.
   
   The same per-row-ops pattern already exists for TimeType on master, so it is 
consistent with the framework's current shape. However, the difference is that 
TimestampFormatter.getFractionFormatter is heavier than the time formatter. The 
codegen path is unaffected (ops built once at codegen time). Probably 
acceptable, but worth confirming it's a conscious trade-off.
   
   Codegen (the default) is unaffected, so are we aware of the interpreted 
nested-collection casts performance implications, and do we have any long-term 
plans to address this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-57285][SQL] Route nanosecond timestamp cast-to-string through the Types Framework [spark]

Reply via email to