justaparth commented on code in PR #41498:
URL: https://github.com/apache/spark/pull/41498#discussion_r1222248127


##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala:
##########
@@ -247,12 +247,86 @@ private[sql] class ProtobufDeserializer(
           updater.setLong(ordinal, micros + 
TimeUnit.NANOSECONDS.toMicros(nanoSeconds))
 
       case (MESSAGE, StringType)
-          if protoType.getMessageType.getFullName == "google.protobuf.Any" =>
+        if protoType.getMessageType.getFullName == "google.protobuf.Any" =>
         (updater, ordinal, value) =>
           // Convert 'Any' protobuf message to JSON string.
           val jsonStr = jsonPrinter.print(value.asInstanceOf[DynamicMessage])
           updater.set(ordinal, UTF8String.fromString(jsonStr))
 
+      // Handle well known wrapper types. We unpack the value field instead of 
keeping

Review Comment:
   > Protobuf documentation says this is mainly for use with Any fields, and 
for the other use case, we can use optional int32.
   
   It's not really relevant what users of this library _could_ have done 
instead of using these well known types; but as additional context, optional 
didn't exist until recent protobuf versions for proto3 anyways, so thats why 
this type existed before that.
   
   > Better for Spark to preserve the same information, right?
   > The original purpose in Protobuf is to distinguish unset vs default value 
(as you mentioned). Spark handling preserves that.
   
   The information about presence is preserved even if we deserialize into 
primitive type instead of as a struct.
   
   
   > Just because of JSON serializer does this does not imply Spark should do.
   
   I agree in principle, but in practice these json libraries represent 
battle-tested and agreed upon ways to deserialize protobufs. (in comparison, 
this spark protobuf parsing is relatively quite new). Spark struct and json 
aren't really that different, both are a data representation of a protocol 
buffer. Given that, I feel like it makes sense to use the json deserialization 
as a useful guide when there are decisions to be made. And it clearly seems 
like these wrappers are intended to function like primitives, not like structs.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to