justaparth commented on code in PR #40686:
URL: https://github.com/apache/spark/pull/40686#discussion_r1179011020


##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala:
##########
@@ -288,7 +289,21 @@ private[sql] class ProtobufDeserializer(
       var skipRow = false
       while (i < validFieldIndexes.length && !skipRow) {
         val field = validFieldIndexes(i)
-        val value = if (field.isRepeated || field.hasDefaultValue || 
record.hasField(field)) {
+
+        // If `materializeZeroValues` is true, the written field will contain 
the
+        // default zero value for its type (e.g. 0 for int, "" for string, etc:
+        // https://protobuf.dev/programming-guides/proto3/#default).
+        // Otherwise, the field will be null unless the serialized proto 
explicitly
+        // contains the field, or it has an explicit default (
+        // proto2 only 
https://protobuf.dev/programming-guides/proto2/#optional)
+        // Note that in proto3, the serialized proto will not contain the field
+        // if the value was explicitly set to its zero value. See:
+        // 
https://protobuf.dev/programming-guides/field_presence/#presence-in-proto3-apis

Review Comment:
   I've done a quick update to this PR! I've added some additional tests and 
made the code more clear as to how field presence is the main factor to 
consider when deserializing



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to