cloud-fan commented on code in PR #56392:
URL: https://github.com/apache/spark/pull/56392#discussion_r3383062055
##########
sql/api/src/main/scala/org/apache/spark/sql/Row.scala:
##########
@@ -632,8 +632,14 @@ trait Row extends Serializable {
if (value == null) {
JNull
} else {
+ // A public Row holds external values (e.g. java.time.LocalTime for
TimeType), so render
+ // through the framework's formatExternal rather than format, which
expects the internal
+ // representation and would otherwise fail the value cast. A framework
type either returns a
+ // rendered string or raises its own error (e.g. the nanosecond
timestamp types raise the
+ // unsupported-rendering error); types outside the framework fall back
to legacy rendering.
TypeApiOps(dataType)
- .map(ops => JString(ops.format(value)))
+ .flatMap(_.formatExternal(value))
Review Comment:
This fixes the framework-enabled path, but the same user-facing bug survives
in the legacy branch: with `spark.sql.timeType.enabled=true` and
`spark.sql.types.framework.enabled=false` (the production default — both
default to `Utils.isTesting`), the external `LocalTime` falls into
`toJsonDefault`, which only matches `(nanos: Long, _: TimeType)` — the internal
representation a public `Row` never holds (from SPARK-54451) — so `Row.json` on
a TIME column still fails, now with `FAILED_ROW_TO_JSON`.
`HiveResult.toHiveStringDefault` handles exactly this case in its legacy
fallback (`case (lt: LocalTime, _: TimeType) => formatters.time.format(lt)`),
so SQL CLI output works where Row JSON fails. How about adding `case (lt:
LocalTime, _: TimeType) => JString(timeFormatter.format(lt))` to
`toJsonDefault` (plus a framework-off test)? While there, the existing `Long`
case is worth a second look: under the framework it's unreachable for valid
rows, and a `Long`-valued row now hits the `a
sInstanceOf[LocalTime]` CCE in `formatExternal` before the fallback is even
consulted.
##########
sql/api/src/main/scala/org/apache/spark/sql/types/ops/TimestampNanosTypeApiOps.scala:
##########
@@ -57,6 +57,19 @@ abstract class TimestampNanosTypeApiOps extends TypeApiOps
with DataTypeErrorsBa
override def format(v: Any): String =
throw DataTypeErrors.cannotConvertNanosTimestampToStringError(dataType)
+ // Row JSON (Row.json / Row.prettyJson) holds the external Row value
(java.time.Instant for LTZ,
+ // java.time.LocalDateTime for NTZ), but rendering nanosecond timestamps
from this zone-less path
+ // is not supported yet (same reason as format above), so raise the
user-facing
+ // UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_TO_STRING error here rather than
silently truncating.
+ override def formatExternal(value: Any): Option[String] =
+ throw DataTypeErrors.cannotConvertNanosTimestampToStringError(dataType)
+
+ // The Hive result path (HiveResult.toHiveString) is zone-aware and renders
nanosecond timestamps
+ // through its own default formatter, so return None here to fall through to
it rather than raise
+ // the unsupported-rendering error that the zone-less single-arg
formatExternal above throws. This
+ // is a temporary override until nanos external rendering is unified across
the two paths.
+ override def formatExternal(value: Any, nested: Boolean): Option[String] =
None
Review Comment:
These two overrides give the `formatExternal` overloads caller-specific
semantics (single-arg = zone-less Row JSON, may throw; two-arg = zone-aware
Hive, `None` = fall through), but the `TypeApiOps` trait still documents them
under "Hive Formatting (optional)" with the 2-param overload described as a
*nesting* variant, and nothing says an implementation may throw instead of
returning `None`. A new type implementer reading the trait won't discover that
the single-arg overload controls `Row.json` output, or that returning `None`
from it silently drops their type into the legacy `toJsonDefault` path
(currently every registered type overrides it, but that's convention, not
contract). Since the temporary split is already well-documented here at the
override sites, could you also update the trait scaladoc to name both consumers
and define the `Some`/`None`/throw semantics?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]