justaparth commented on code in PR #40686:
URL: https://github.com/apache/spark/pull/40686#discussion_r1180841979


##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala:
##########
@@ -288,9 +289,34 @@ private[sql] class ProtobufDeserializer(
       var skipRow = false
       while (i < validFieldIndexes.length && !skipRow) {
         val field = validFieldIndexes(i)
-        val value = if (field.isRepeated || field.hasDefaultValue || 
record.hasField(field)) {
-          record.getField(field)
-        } else null
+
+        // In case field presence is defined, we can use it to figure out 
whether
+        // a field is present, and output a value (or null) based on that.
+        // If a field has no presence info, we'll populate it if:
+        // - It's explicitly available in the *serialized* proto.
+        // - its a proto2 field with an explicit default value
+        // - `materializeZeroValues` has been set. In this case
+        //    getField will return the default value for the field's type.
+        // - It's a repeated field, which gets populated as []
+        // Please see: https://protobuf.dev/programming-guides/field_presence
+        val value = if (field.hasPresence) {

Review Comment:
   resolving since the code has changed a bit and we have a new thread!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to