uros-db commented on code in PR #54325:
URL: https://github.com/apache/spark/pull/54325#discussion_r2817662508


##########
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/RowSetUtils.scala:
##########
@@ -142,7 +142,16 @@ object RowSetUtils {
           val value = if (row.isNullAt(ordinal)) {
             ""
           } else {
-            toHiveString((row.get(ordinal), typ), nested = true, 
timeFormatters, binaryFormatter)
+            // Geospatial types implement nested quoting in `toHiveString` 
(wrapping EWKT in double
+            // quotes when nested = true), intended for values inside 
containers like arrays, maps,
+            // or structs. However, in this Thrift Server code path, `nested` 
is always set to true
+            // because values are serialized into Hive string columns. We 
override to false her for
+            // singular geospatial types to avoid spurious quotes around 
standalone EWKT values.

Review Comment:
   There is a reason actually. Geospatial types are the only non-string types 
that are quote-aware in `toHiveString`, a design choice that has been made for 
output clarity because the EWKT text-based format inherently carries a 
delimiter for SRID prefix. For single geometry, we want: `SRID=4326;POINT(1 
2)`, but for array we want: `["SRID=4326;POINT(1 2)","SRID=4326;POINT(3 4)"]`. 
Every other non-primitive type that currently goes through the catch-all 
default case (intervals, variant, udt) doesn't care about the `nested` flag and 
returns the same string regardless.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to