[GitHub] [spark] pang-wu commented on a diff in pull request #41498: [SPARK-44001][Protobuf] spark protobuf: handle well known wrapper types

via GitHub Wed, 07 Jun 2023 15:43:19 -0700


pang-wu commented on code in PR #41498:
URL: https://github.com/apache/spark/pull/41498#discussion_r1222252120



##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala:
##########
@@ -247,12 +247,86 @@ private[sql] class ProtobufDeserializer(
           updater.setLong(ordinal, micros + 
TimeUnit.NANOSECONDS.toMicros(nanoSeconds))
 
       case (MESSAGE, StringType)
-          if protoType.getMessageType.getFullName == "google.protobuf.Any" =>
+        if protoType.getMessageType.getFullName == "google.protobuf.Any" =>
         (updater, ordinal, value) =>
           // Convert 'Any' protobuf message to JSON string.
           val jsonStr = jsonPrinter.print(value.asInstanceOf[DynamicMessage])
           updater.set(ordinal, UTF8String.fromString(jsonStr))
 
+      // Handle well known wrapper types. We unpack the value field instead of 
keeping

Review Comment:
   > Better for Spark to preserve the same information, right?
   
   I would say no, the issue here is after converting these data type to 
struct, Spark actually erase the original type info, i.e. all user see is a 
struct, but this struct could be a custom struct from the user rather than 
wrapper types. In that case, we provide no additional information for a user to 
decide whether special action needs to take -- remember the data consumer may 
not have the original schema at hand.
   The idea behind wrapper type are these are structures defined that have 
special meaning, so parsers can leverage these type information to get presence 
information. However what we are doing here is removing that information, which 
is the opposite of what we want.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] pang-wu commented on a diff in pull request #41498: [SPARK-44001][Protobuf] spark protobuf: handle well known wrapper types

Reply via email to