pang-wu commented on code in PR #41498:
URL: https://github.com/apache/spark/pull/41498#discussion_r1223754931
##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala:
##########
@@ -247,12 +247,86 @@ private[sql] class ProtobufDeserializer(
updater.setLong(ordinal, micros +
TimeUnit.NANOSECONDS.toMicros(nanoSeconds))
case (MESSAGE, StringType)
- if protoType.getMessageType.getFullName == "google.protobuf.Any" =>
+ if protoType.getMessageType.getFullName == "google.protobuf.Any" =>
(updater, ordinal, value) =>
// Convert 'Any' protobuf message to JSON string.
val jsonStr = jsonPrinter.print(value.asInstanceOf[DynamicMessage])
updater.set(ordinal, UTF8String.fromString(jsonStr))
+ // Handle well known wrapper types. We unpack the value field instead of
keeping
Review Comment:
So the motivating example is if someone want to convert the struct generated
by Spark to json, and compare(or maintain the compatibility of) that json with
another json generated from the same protobuf message using Go or Java
JsonFormat, they are not comparable because Spark doesn't follow spec to
translate well know types.
The team who want to do such a comparison has to write a custom converter,
and writing such converter is painful because 1) the type information is loss,
2) even with the type info, there is no easy way to know at what level we
should get rid of the struct and replace it with a scalar -- this is a real
usecase we are running into.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]