justaparth commented on code in PR #40686:
URL: https://github.com/apache/spark/pull/40686#discussion_r1179804071


##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala:
##########
@@ -46,6 +46,42 @@ private[sql] class ProtobufOptions(
   // record has more depth than the allowed value for recursive fields, it 
will be truncated
   // and corresponding fields are ignored (dropped).
   val recursiveFieldMaxDepth: Int = 
parameters.getOrElse("recursive.fields.max.depth", "-1").toInt
+
+  // For fields without presence information, there is ambiguity in serialized 
protos
+  // as to whether the field was never written or was written with its zero 
value.
+  // This is because such fields are not serialized if they contain their zero 
value.
+  // This includes most fields in proto3.
+  // Ref: https://protobuf.dev/programming-guides/field_presence
+  // https://protobuf.dev/programming-guides/field_presence/
+  //  #presence-in-tag-value-stream-wire-format-serialization
+  //
+  // By default, we will deserialize both cases as null. However, this flag can
+  // choose to explicitly deserialize as the zero value for the type, as
+  // libraries in some other will languages do.
+  //
+  // For example, if we have a proto like
+  // ```
+  // syntax = "proto3";
+  // message Person {
+  //   string name = 1;
+  //   int64 age = 2;
+  // }
+  // ```
+  //
+  // And we have the serialized representation of the following proto:
+  // `Person(name="", age=0)`

Review Comment:
   It would be `{"name": null, "age": null}` by default, and `{"name": "", 
"age": 0}` with the materialize zero value flag set. The tests from line 1123 - 
1254 i think demonstrate this behavior



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to