uros-db commented on code in PR #54325:
URL: https://github.com/apache/spark/pull/54325#discussion_r2817662508
##########
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/RowSetUtils.scala:
##########
@@ -142,7 +142,16 @@ object RowSetUtils {
val value = if (row.isNullAt(ordinal)) {
""
} else {
- toHiveString((row.get(ordinal), typ), nested = true,
timeFormatters, binaryFormatter)
+ // Geospatial types implement nested quoting in `toHiveString`
(wrapping EWKT in double
+ // quotes when nested = true), intended for values inside
containers like arrays, maps,
+ // or structs. However, in this Thrift Server code path, `nested`
is always set to true
+ // because values are serialized into Hive string columns. We
override to false her for
+ // singular geospatial types to avoid spurious quotes around
standalone EWKT values.
Review Comment:
There is a reason actually. Geospatial types are the only non-string types
that are quote-aware in `toHiveString`, a design choice that has been made for
output clarity because the EWKT text-based format inherently carries a
delimiter for SRID prefix. For single geometry, we want: `SRID=4326;POINT(1
2)`, but for array we want: `["SRID=4326;POINT(1 2)","SRID=4326;POINT(3 4)"]`.
Every other non-primitive type that currently goes through the catch-all
default case (intervals, variant, udt) doesn't care about the `nested` flag and
returns the same string regardless.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]