justaparth commented on code in PR #41498:
URL: https://github.com/apache/spark/pull/41498#discussion_r1222183033
##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala:
##########
@@ -247,12 +247,86 @@ private[sql] class ProtobufDeserializer(
updater.setLong(ordinal, micros +
TimeUnit.NANOSECONDS.toMicros(nanoSeconds))
case (MESSAGE, StringType)
- if protoType.getMessageType.getFullName == "google.protobuf.Any" =>
+ if protoType.getMessageType.getFullName == "google.protobuf.Any" =>
(updater, ordinal, value) =>
// Convert 'Any' protobuf message to JSON string.
val jsonStr = jsonPrinter.print(value.asInstanceOf[DynamicMessage])
updater.set(ordinal, UTF8String.fromString(jsonStr))
+ // Handle well known wrapper types. We unpack the value field instead of
keeping
Review Comment:
> This says the purpose is to find if the value is set or just the default
This is the purpose of the wrapper types, not of this PR.
This PR is trying to respect how wrapper types should be interpreted when
deserialized into other formats. The documentation for the types indicates this
behavior explicitly for json, e.g. for BytesValue
https://protobuf.dev/reference/protobuf/google.protobuf/#bytes-value
```
Wrapper message for double.
The JSON representation for DoubleValue is JSON number.
```
The source definition for the messages also says this
https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/wrappers.proto
And if you look at Jsonformat in java protobuf utils or golang's json
library, they also make sure to special case these so that they're not expanded
as messages, but rather treated like primitives.
In this PR i do the same thing with respect to deserializing to a spark
struct, as these wrapper types are intended to be treated like primitives.
> Also, how do we handle backward compatibility?
Currently, this PR represents a change in the behavior. Open to discussion
of course, but I think that the behavior currently is inconsistent with the
intention of the types / other libraries, so we should make it the default
behavior rather than hide it behind an option. We _could_ introduce an option
to allow expanding these as messages if users want to preserve that kind of
behavior.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]