justaparth commented on code in PR #40686:
URL: https://github.com/apache/spark/pull/40686#discussion_r1180843562


##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala:
##########
@@ -299,6 +298,29 @@ private[sql] class ProtobufDeserializer(
     }
   }
 
+  private def getFieldValue(record: DynamicMessage, field: FieldDescriptor): 
AnyRef = {
+    // We return a value if one of:
+    // - the field is repeated
+    // - the field is explicitly present in the serialized proto
+    // - the field is proto2 with a default
+    // - field presence is not available and materializeZeroValues is set
+    //
+    // Repeated fields have to be treated separately as they cannot have 
`hasField`
+    // called on them. And we only materialize zero values for fields without 
presence
+    // information because that flag controls the behavior for those fields in 
the ambiguous
+    // case of "unset" or "set to zero value". See the docs in 
[[ProtobufOptions]] for more
+    // details.

Review Comment:
   and in addition, to respond to your comment about why we need hasPresence if 
we already have hasField, 
here:https://github.com/apache/spark/pull/40686#discussion_r1179432031
   
   on line 317 we know that `record.hasField(field)` is false, indicating that 
the field has no value in the serialized proto. However, what we don't know is 
_why_ it is false. 
   
   If `field.hasPresence()` is true, then we know for sure that the field was 
never set (i.e. optional proto3, proto2, etc.), and it would be incorrect for 
us to materialize the zero value even if the flag is set.
   
   If `field.hasPresence()` is false, then its the situation where the "not 
set" and "zero value" are not distinguishable and we check the flag to see if 
we should materialize the zero value



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to