rangadi commented on code in PR #40686:
URL: https://github.com/apache/spark/pull/40686#discussion_r1179422692


##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala:
##########
@@ -46,9 +47,35 @@ private[sql] class ProtobufOptions(
   // record has more depth than the allowed value for recursive fields, it 
will be truncated
   // and corresponding fields are ignored (dropped).
   val recursiveFieldMaxDepth: Int = 
parameters.getOrElse("recursive.fields.max.depth", "-1").toInt
+
+  // Whether or not to explicitly materialize the zero values for fields
+  // without field presence information 
https://protobuf.dev/programming-guides/field_presence/.
+  // This includes most fields in proto3.
+  //
+  // For example, if we have a proto like
+  // ```
+  // syntax = "proto3";
+  // message Example {
+  //   string s = 1;
+  //   int64 i = 2;
+  // }
+  // ```
+  //
+  // And we have the serialized representation of the following proto:
+  // `Example(s="", i=0)`
+  //
+  // The result after calling from_protobuf without this flag set would be:
+  // `{"s": null, "i": null}`
+  //
+  // To explicitly materialize that default zero value, as readers in some 
other languages
+  // will do, this flag can be set to get a deserialized result like:
+  // `{"s": "", "i": 0}`
+  val materializeZeroValues: Boolean =
+    parameters.getOrElse(ProtobufOptions.materializeZeroValues, 
false.toString).toBoolean
 }
 
 private[sql] object ProtobufOptions {
+  val materializeZeroValues = "materializeZeroValues"

Review Comment:
   Also 'materialize' might be a bit confusing too (since if often refers to 
materializing intermediate Dataframes/RDD in Spark context).  I don't have have 
alternate name. 
   We could go with `materialize.default.values` for now. I don't think we need 
`presenceless` in it.
   The goal is the documentation should be detailed enough that users 
understand the behaviour. Naming could be fairly simple.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to