rangadi commented on code in PR #41498:
URL: https://github.com/apache/spark/pull/41498#discussion_r1221858347
##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala:
##########
@@ -247,12 +247,86 @@ private[sql] class ProtobufDeserializer(
updater.setLong(ordinal, micros +
TimeUnit.NANOSECONDS.toMicros(nanoSeconds))
case (MESSAGE, StringType)
- if protoType.getMessageType.getFullName == "google.protobuf.Any" =>
+ if protoType.getMessageType.getFullName == "google.protobuf.Any" =>
(updater, ordinal, value) =>
// Convert 'Any' protobuf message to JSON string.
val jsonStr = jsonPrinter.print(value.asInstanceOf[DynamicMessage])
updater.set(ordinal, UTF8String.fromString(jsonStr))
+ // Handle well known wrapper types. We unpack the value field instead of
keeping
Review Comment:
> Under com.google.protobuf, there are some [well known wrapper
types](https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/wrappers.proto),
useful for distinguishing between absence of primitive fields and their
default values
This says the purpose is to find if the value is set or just the default.
How does this PR provide that functionality? With this PR, user can't
distinguish between `int32 int_value` and `IntValue int_value`.
It will be good to have a Spark example where this helps.
I am not yet sure about including a lot of customization like this.
##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala:
##########
@@ -247,12 +247,86 @@ private[sql] class ProtobufDeserializer(
updater.setLong(ordinal, micros +
TimeUnit.NANOSECONDS.toMicros(nanoSeconds))
case (MESSAGE, StringType)
- if protoType.getMessageType.getFullName == "google.protobuf.Any" =>
+ if protoType.getMessageType.getFullName == "google.protobuf.Any" =>
(updater, ordinal, value) =>
// Convert 'Any' protobuf message to JSON string.
val jsonStr = jsonPrinter.print(value.asInstanceOf[DynamicMessage])
updater.set(ordinal, UTF8String.fromString(jsonStr))
+ // Handle well known wrapper types. We unpack the value field instead of
keeping
+ // them as nested structs
+ case (MESSAGE, BooleanType)
+ if protoType.getMessageType.getFullName ==
BoolValue.getDescriptor.getFullName =>
+ (updater, ordinal, value) =>
+ val dm = value.asInstanceOf[DynamicMessage]
+ updater.setBoolean(
+ ordinal,
+ dm.getField(
+ dm.getDescriptorForType.findFieldByName("value")
Review Comment:
Can use `BoolValue.getDescriptor`, right?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]