Re: [PR] [SPARK-57338][SQL] Render external values in Row JSON via formatExternal [spark]

via GitHub Tue, 09 Jun 2026 11:48:14 -0700


cloud-fan commented on code in PR #56392:
URL: https://github.com/apache/spark/pull/56392#discussion_r3383062055



##########
sql/api/src/main/scala/org/apache/spark/sql/Row.scala:
##########
@@ -632,8 +632,14 @@ trait Row extends Serializable {
       if (value == null) {
         JNull
       } else {
+        // A public Row holds external values (e.g. java.time.LocalTime for 
TimeType), so render
+        // through the framework's formatExternal rather than format, which 
expects the internal
+        // representation and would otherwise fail the value cast. A framework 
type either returns a
+        // rendered string or raises its own error (e.g. the nanosecond 
timestamp types raise the
+        // unsupported-rendering error); types outside the framework fall back 
to legacy rendering.
         TypeApiOps(dataType)
-          .map(ops => JString(ops.format(value)))
+          .flatMap(_.formatExternal(value))

Review Comment:
   This fixes the framework-enabled path, but the same user-facing bug survives 
in the legacy branch: with `spark.sql.timeType.enabled=true` and 
`spark.sql.types.framework.enabled=false` (the production default — both 
default to `Utils.isTesting`), the external `LocalTime` falls into 
`toJsonDefault`, which only matches `(nanos: Long, _: TimeType)` — the internal 
representation a public `Row` never holds (from SPARK-54451) — so `Row.json` on 
a TIME column still fails, now with `FAILED_ROW_TO_JSON`. 
`HiveResult.toHiveStringDefault` handles exactly this case in its legacy 
fallback (`case (lt: LocalTime, _: TimeType) => formatters.time.format(lt)`), 
so SQL CLI output works where Row JSON fails. How about adding `case (lt: 
LocalTime, _: TimeType) => JString(timeFormatter.format(lt))` to 
`toJsonDefault` (plus a framework-off test)? While there, the existing `Long` 
case is worth a second look: under the framework it's unreachable for valid 
rows, and a `Long`-valued row now hits the `a
 sInstanceOf[LocalTime]` CCE in `formatExternal` before the fallback is even 
consulted.



##########
sql/api/src/main/scala/org/apache/spark/sql/types/ops/TimestampNanosTypeApiOps.scala:
##########
@@ -57,6 +57,19 @@ abstract class TimestampNanosTypeApiOps extends TypeApiOps 
with DataTypeErrorsBa
   override def format(v: Any): String =
     throw DataTypeErrors.cannotConvertNanosTimestampToStringError(dataType)
 
+  // Row JSON (Row.json / Row.prettyJson) holds the external Row value 
(java.time.Instant for LTZ,
+  // java.time.LocalDateTime for NTZ), but rendering nanosecond timestamps 
from this zone-less path
+  // is not supported yet (same reason as format above), so raise the 
user-facing
+  // UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_TO_STRING error here rather than 
silently truncating.
+  override def formatExternal(value: Any): Option[String] =
+    throw DataTypeErrors.cannotConvertNanosTimestampToStringError(dataType)
+
+  // The Hive result path (HiveResult.toHiveString) is zone-aware and renders 
nanosecond timestamps
+  // through its own default formatter, so return None here to fall through to 
it rather than raise
+  // the unsupported-rendering error that the zone-less single-arg 
formatExternal above throws. This
+  // is a temporary override until nanos external rendering is unified across 
the two paths.
+  override def formatExternal(value: Any, nested: Boolean): Option[String] = 
None

Review Comment:
   These two overrides give the `formatExternal` overloads caller-specific 
semantics (single-arg = zone-less Row JSON, may throw; two-arg = zone-aware 
Hive, `None` = fall through), but the `TypeApiOps` trait still documents them 
under "Hive Formatting (optional)" with the 2-param overload described as a 
*nesting* variant, and nothing says an implementation may throw instead of 
returning `None`. A new type implementer reading the trait won't discover that 
the single-arg overload controls `Row.json` output, or that returning `None` 
from it silently drops their type into the legacy `toJsonDefault` path 
(currently every registered type overrides it, but that's convention, not 
contract). Since the temporary split is already well-documented here at the 
override sites, could you also update the trait scaladoc to name both consumers 
and define the `Some`/`None`/throw semantics?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-57338][SQL] Render external values in Row JSON via formatExternal [spark]

Reply via email to