Re: [PR] [SPARK-46736][PROTOBUF] retain empty message field in protobuf connector [spark]

via GitHub Thu, 18 Jan 2024 11:57:54 -0800


rangadi commented on code in PR #44643:
URL: https://github.com/apache/spark/pull/44643#discussion_r1457918795



##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala:
##########
@@ -1124,7 +1124,55 @@ class ProtobufFunctionsSuite extends QueryTest with 
SharedSparkSession with Prot
     }
   }
 
-  test("Corner case: empty recursive proto fields should be dropped") {
+  test("retain empty proto fields") {
+    val options = Map("recursive.fields.max.depth" -> "4", 
"retain.empty.message" -> "true")
+
+    // EmptyRecursiveProto at the top level. It will be an empty struct.
+    checkWithFileAndClassName("EmptyProto") {
+      case (name, descFilePathOpt) =>
+        val df = emptyBinaryDF.select(
+          from_protobuf_wrapper($"binary", name, descFilePathOpt, 
options).as("empty_proto")
+        )
+        // Top level empty message is retained without adding dummy column to 
the schema.
+        assert(df.schema == structFromDDL("empty_proto struct<>"))

Review Comment:
   Why is that? Isn't it better to be consistent? "If there is an empty 
protobuf, it will have a dummy column". 
   The users don't distinguish between top level vs nested.
   Btw, does parquet test below include empty top level struct? Could you point 
to it? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46736][PROTOBUF] retain empty message field in protobuf connector [spark]

Reply via email to