justaparth commented on code in PR #40686:
URL: https://github.com/apache/spark/pull/40686#discussion_r1181124838


##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala:
##########
@@ -299,6 +298,29 @@ private[sql] class ProtobufDeserializer(
     }
   }
 
+  private def getFieldValue(record: DynamicMessage, field: FieldDescriptor): 
AnyRef = {
+    // We return a value if one of:
+    // - the field is repeated
+    // - the field is explicitly present in the serialized proto
+    // - the field is proto2 with a default
+    // - field presence is not available and materializeZeroValues is set
+    //
+    // Repeated fields have to be treated separately as they cannot have 
`hasField`
+    // called on them. And we only materialize zero values for fields without 
presence
+    // information because that flag controls the behavior for those fields in 
the ambiguous
+    // case of "unset" or "set to zero value". See the docs in 
[[ProtobufOptions]] for more
+    // details.

Review Comment:
   @rangadi with respect to message serialization, what do you think about this 
[test](https://github.com/apache/spark/pull/40686/files#diff-f5dc81b1d20dde97f9d1dd26e993f811fa096bf68936ab5e449b9937314585d8R1196-R1239)?
 in it, there is a message field that is never set originally and in 
deserializing it with/without the flag it is null. is that expected / 
unexpected to you? 
   
   > Exclude message types from serializing explicitly (no matter what our 
presense policy)
   Give a rational for using presence information. I.e. why should optional 
string a and string a behave differently with this flag.
   
   
   Sorry just to clarify, are you asking for a rationale here in the comments, 
or would you like me to include it in the documentation so that its a bit more 
clear?
   
   > In fact this check does not even matter for scala types due to hadField() 
earlier. So there is no need to have it.
   
   Ah check the other thread! i commented there as to how this breaks if we 
don't use hasPresence 
https://github.com/apache/spark/pull/40686#discussion_r1181124590



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to