justaparth commented on code in PR #41498:
URL: https://github.com/apache/spark/pull/41498#discussion_r1222183033


##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala:
##########
@@ -247,12 +247,86 @@ private[sql] class ProtobufDeserializer(
           updater.setLong(ordinal, micros + 
TimeUnit.NANOSECONDS.toMicros(nanoSeconds))
 
       case (MESSAGE, StringType)
-          if protoType.getMessageType.getFullName == "google.protobuf.Any" =>
+        if protoType.getMessageType.getFullName == "google.protobuf.Any" =>
         (updater, ordinal, value) =>
           // Convert 'Any' protobuf message to JSON string.
           val jsonStr = jsonPrinter.print(value.asInstanceOf[DynamicMessage])
           updater.set(ordinal, UTF8String.fromString(jsonStr))
 
+      // Handle well known wrapper types. We unpack the value field instead of 
keeping

Review Comment:
   > This says the purpose is to find if the value is set or just the default
   
   This is the purpose of the wrapper types, not of this PR.
   
   This PR is trying to respect how wrapper types should be interpreted when 
deserialized into other formats. The documentation for the types indicates this 
behavior explicitly for json, e.g. for BytesValue 
https://protobuf.dev/reference/protobuf/google.protobuf/#bytes-value 
   
   ```
   Wrapper message for double.
   
   The JSON representation for DoubleValue is JSON number.
   ```
   
   The source definition for the messages also says this 
https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/wrappers.proto
 
   
   And if you look at Jsonformat in java protobuf utils or golang's json 
library, they also make sure to special case these so that they're not expanded 
as messages, but rather treated like primitives.
   
   In this PR i do the same thing with respect to deserializing to a spark 
struct, as these wrapper types are intended to be treated like primitives. 
   
    > Also, how do we handle backward compatibility?
    
   Currently, this PR represents a change in the behavior. Open to discussion 
of course, but I think that the behavior currently is inconsistent with the 
intention of the types / other libraries, so we should make it the default 
behavior rather than hide it behind an option. We _could_ introduce an option 
to allow expanding these as messages if users want to preserve that kind of 
behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to