justaparth commented on code in PR #40686:
URL: https://github.com/apache/spark/pull/40686#discussion_r1179241756


##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala:
##########
@@ -46,9 +47,35 @@ private[sql] class ProtobufOptions(
   // record has more depth than the allowed value for recursive fields, it 
will be truncated
   // and corresponding fields are ignored (dropped).
   val recursiveFieldMaxDepth: Int = 
parameters.getOrElse("recursive.fields.max.depth", "-1").toInt
+
+  // Whether or not to explicitly materialize the zero values for fields
+  // without field presence information 
https://protobuf.dev/programming-guides/field_presence/.
+  // This includes most fields in proto3.
+  //
+  // For example, if we have a proto like
+  // ```
+  // syntax = "proto3";
+  // message Example {
+  //   string s = 1;
+  //   int64 i = 2;
+  // }
+  // ```
+  //
+  // And we have the serialized representation of the following proto:
+  // `Example(s="", i=0)`
+  //
+  // The result after calling from_protobuf without this flag set would be:
+  // `{"s": null, "i": null}`
+  //
+  // To explicitly materialize that default zero value, as readers in some 
other languages
+  // will do, this flag can be set to get a deserialized result like:
+  // `{"s": "", "i": 0}`
+  val materializeZeroValues: Boolean =
+    parameters.getOrElse(ProtobufOptions.materializeZeroValues, 
false.toString).toBoolean
 }
 
 private[sql] object ProtobufOptions {
+  val materializeZeroValues = "materializeZeroValues"

Review Comment:
   > We can remove this and use actual string.
   
   sorry, could you explain more what you mean by this?
   
   >  Also use '.' separated names, similar to recursive.fields.max.depth. How 
about: "materialize.default.values"z
   
   sure! i'll follow the convention in the file 👍
   
   i'm realizing the name is a bit odd, as the behavior is that we materialize 
zero values _for fields without presence information_. so maybe the right to 
call it would be `materialize.presenceless.default.values`? though thats kind 
of a mouthful.. would love if you have any thoughts on a good name (i'll keep 
thinking while im updating the PR as well)
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to