justaparth commented on code in PR #40686:
URL: https://github.com/apache/spark/pull/40686#discussion_r1181124590


##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala:
##########
@@ -299,6 +298,29 @@ private[sql] class ProtobufDeserializer(
     }
   }
 
+  private def getFieldValue(record: DynamicMessage, field: FieldDescriptor): 
AnyRef = {
+    // We return a value if one of:
+    // - the field is repeated
+    // - the field is explicitly present in the serialized proto
+    // - the field is proto2 with a default
+    // - field presence is not available and materializeZeroValues is set
+    //
+    // Repeated fields have to be treated separately as they cannot have 
`hasField`
+    // called on them. And we only materialize zero values for fields without 
presence
+    // information because that flag controls the behavior for those fields in 
the ambiguous
+    // case of "unset" or "set to zero value". See the docs in 
[[ProtobufOptions]] for more
+    // details.
+    if (
+      field.isRepeated
+        || record.hasField(field)
+        || field.hasDefaultValue
+        || (!field.hasPresence && this.materializeZeroValues)) {

Review Comment:
   > I.e. if we replace !field.hasPresence with field.getJavaType != MESSAGE, I 
think the tests pass.
   
   actually, the tests, as written, will not pass. specifically, if we change 
`!field.hasPresence` to `field.getJavaType != message` then proto3 optional 
fields fields that were never set on the initial proto will get deserialized as 
their default value. the test 
[here](https://github.com/apache/spark/pull/40686/files#diff-f5dc81b1d20dde97f9d1dd26e993f811fa096bf68936ab5e449b9937314585d8R1196-R1239)
 checks that.
   
   just to make sure, i did try this locally and verified the tests failed in 
the way described above.
   
   > We should also explicitly mention in the documentation about messages. 
Most readers don't understands what 'has presence information' means. They 
shouldn't have to read detailed protobuf spec to understand what this flag does.
   How about this: add the message check in addition to !hasPresense() check? 
(It's no op).
   
   ah, if documentation / clarity is the issue i'm more than happy to write up 
some more comments or clear up the documentation!
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to