rangadi commented on code in PR #41498:
URL: https://github.com/apache/spark/pull/41498#discussion_r1222534443
##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala:
##########
@@ -247,12 +247,86 @@ private[sql] class ProtobufDeserializer(
updater.setLong(ordinal, micros +
TimeUnit.NANOSECONDS.toMicros(nanoSeconds))
case (MESSAGE, StringType)
- if protoType.getMessageType.getFullName == "google.protobuf.Any" =>
+ if protoType.getMessageType.getFullName == "google.protobuf.Any" =>
(updater, ordinal, value) =>
// Convert 'Any' protobuf message to JSON string.
val jsonStr = jsonPrinter.print(value.asInstanceOf[DynamicMessage])
updater.set(ordinal, UTF8String.fromString(jsonStr))
+ // Handle well known wrapper types. We unpack the value field instead of
keeping
Review Comment:
Not sure I follow. This is a serde for Protobuf and Spark struct. Consumer
and Producers are expected to know the schema.
Can we have a concrete example where this makes a difference? What problem
are we solving? Wrapper types are used because the wrapper is important,
otherwise no need to use it. I don't see how stripping the wrapper is the right
thing.
These are just utilities, not a Protobuf spec.
Did you check generate Java code? It treats it just as another Protobuf
message. There is no special treatment. Why should Spark be different?
Can we have a fully spelled out example in Spark that shows the the
benefits?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]