rangadi commented on code in PR #40686:
URL: https://github.com/apache/spark/pull/40686#discussion_r1181135882


##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala:
##########
@@ -299,6 +298,29 @@ private[sql] class ProtobufDeserializer(
     }
   }
 
+  private def getFieldValue(record: DynamicMessage, field: FieldDescriptor): 
AnyRef = {
+    // We return a value if one of:
+    // - the field is repeated
+    // - the field is explicitly present in the serialized proto
+    // - the field is proto2 with a default
+    // - field presence is not available and materializeZeroValues is set
+    //
+    // Repeated fields have to be treated separately as they cannot have 
`hasField`
+    // called on them. And we only materialize zero values for fields without 
presence
+    // information because that flag controls the behavior for those fields in 
the ambiguous
+    // case of "unset" or "set to zero value". See the docs in 
[[ProtobufOptions]] for more
+    // details.
+    if (
+      field.isRepeated
+        || record.hasField(field)
+        || field.hasDefaultValue
+        || (!field.hasPresence && this.materializeZeroValues)) {

Review Comment:
   > lit(null).as("optional_int"), 
   
   This [line at 
1229](https://github.com/apache/spark/pull/40686/files#diff-f5dc81b1d20dde97f9d1dd26e993f811fa096bf68936ab5e449b9937314585d8R1229)
 ? 
   
   I am more confused about the feature now. I thought all scaler fields like 
'int' will have a value with this option set. 
   
   What is the spec we are adhering to? What is the use case we are solving? 
May be we could update the flag documentation. 
   
   Is this essentially 'materialize materialize protobuf V3 scalar fields that 
are not declared optional'? 
   
   We are giving too much weightage to `optional` keyword it looks like. 
Logically all fields are optional in Proto3 by default. 
   
   What happens to a `optional int` in V2 proto? Will it behave like proto3 
`int` proto2 `optional int`?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to