justaparth commented on code in PR #40686:
URL: https://github.com/apache/spark/pull/40686#discussion_r1179803443


##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala:
##########
@@ -299,6 +298,29 @@ private[sql] class ProtobufDeserializer(
     }
   }
 
+  private def getFieldValue(record: DynamicMessage, field: FieldDescriptor): 
AnyRef = {
+    // We return a value if one of:
+    // - the field is repeated
+    // - the field is explicitly present in the serialized proto
+    // - the field is proto2 with a default
+    // - field presence is not available and materializeZeroValues is set
+    //
+    // Repeated fields have to be treated separately as they cannot have 
`hasField`
+    // called on them. And we only materialize zero values for fields without 
presence
+    // information because that flag controls the behavior for those fields in 
the ambiguous
+    // case of "unset" or "set to zero value". See the docs in 
[[ProtobufOptions]] for more
+    // details.

Review Comment:
   one example is this test: `test("test unset values - proto3") {` on line 
1196 of `ProtobufFunctionsSuit.scala` (sorry, i'm not sure how to link to it)
   
   in that test, there are some proto3 optional fields, like `optional_int` 
which are not set to any value. In this case, even with materialize zero values 
set, we don't want to deserialize it as 0. This is because for proto3 optional 
fields we have field presence and there is no ambiguity between "unset" and 
"set to zero value", so it'd actually be _incorrect_ of us to try to set it to 
0.
   
   basically the materialize zero values flag is only intended to choose the 
behavior in cases where "unset" and "set to zero value" are not 
distinguishable. the default behavior is to deserialize both to null. with the 
flag, it is to deserialize both as the zero value. 
   
   the latter behavior is what the proto3 docs actually talk about (and was the 
motivation for this PR, as we have other places in our codebase where default 
values get serialized like this 
https://protobuf.dev/programming-guides/proto3/#default)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to