rangadi commented on code in PR #40686:
URL: https://github.com/apache/spark/pull/40686#discussion_r1181155237
##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala:
##########
@@ -299,6 +298,29 @@ private[sql] class ProtobufDeserializer(
}
}
+ private def getFieldValue(record: DynamicMessage, field: FieldDescriptor):
AnyRef = {
+ // We return a value if one of:
+ // - the field is repeated
+ // - the field is explicitly present in the serialized proto
+ // - the field is proto2 with a default
+ // - field presence is not available and materializeZeroValues is set
+ //
+ // Repeated fields have to be treated separately as they cannot have
`hasField`
+ // called on them. And we only materialize zero values for fields without
presence
+ // information because that flag controls the behavior for those fields in
the ambiguous
+ // case of "unset" or "set to zero value". See the docs in
[[ProtobufOptions]] for more
+ // details.
+ if (
+ field.isRepeated
+ || record.hasField(field)
+ || field.hasDefaultValue
+ || (!field.hasPresence && this.materializeZeroValues)) {
Review Comment:
> aren't all the types important? by this logic we should have a callout for
every single type.
What? You want to exclude int etc? Message is different because it can
contain other messages and easily blow up struct. Remember you could not
explain if Messages would be serialized or not in earlier comment? Why, because
it was not easy why they would be or wound not be included. You tested it, but
didn't have an explanation.
> The goal of this PR is to give an option for developer who want Spark's
protobuf to struct deserialization behavior comply to proto3 specs.
@pang-wu what spec is this? It will still have null for fields. I am ok with
the feature, I don't understand the motivation.
We are providing any extra information in the Spark struct.
Could you given an example of problem this lets you solve?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]