chaoqin-li1123 commented on code in PR #44643:
URL: https://github.com/apache/spark/pull/44643#discussion_r1469066080


##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala:
##########
@@ -207,6 +207,23 @@ private[sql] class ProtobufOptions(
   //    nil => nil, Int32Value(0) => 0, Int32Value(100) => 100.
   val unwrapWellKnownTypes: Boolean =
     parameters.getOrElse("unwrap.primitive.wrapper.types", 
false.toString).toBoolean
+
+  // Since Spark doesn't allow writing empty StructType, empty proto message 
type will be
+  // dropped by default. Setting this option to true will insert a dummy 
column to empty proto
+  // message so that the empty message will be retained.
+  // For example, an empty message is used as field in another message:
+  //
+  // ```
+  // message A {}
+  // Message B {A a = 1, string name = 2}
+  // ```
+  //
+  // By default, in the spark schema field a will be dropped, which result in 
schema
+  // b struct<name: string>
+  // If retain.empty.message.types=true, field a will be retained by inserting 
a dummy column.
+  // b struct<name: string, a struct<__dummy_field_in_empty_struct: string>>

Review Comment:
   Nice catch, fixed.



##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala:
##########
@@ -1136,7 +1208,8 @@ class ProtobufFunctionsSuite extends QueryTest with 
SharedSparkSession with Prot
           val df = emptyBinaryDF.select(
             from_protobuf_wrapper($"binary", name, descFilePathOpt, 
options).as("empty_proto")
           )
-        assert(df.schema == structFromDDL("empty_proto struct<>"))
+        assert(df.schema ==
+          structFromDDL("empty_proto struct<>"))

Review Comment:
   This test don't configure the option explicitly, so the default behavior is 
the same as before.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to