rangadi commented on code in PR #44643:
URL: https://github.com/apache/spark/pull/44643#discussion_r1457918795
##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala:
##########
@@ -1124,7 +1124,55 @@ class ProtobufFunctionsSuite extends QueryTest with
SharedSparkSession with Prot
}
}
- test("Corner case: empty recursive proto fields should be dropped") {
+ test("retain empty proto fields") {
+ val options = Map("recursive.fields.max.depth" -> "4",
"retain.empty.message" -> "true")
+
+ // EmptyRecursiveProto at the top level. It will be an empty struct.
+ checkWithFileAndClassName("EmptyProto") {
+ case (name, descFilePathOpt) =>
+ val df = emptyBinaryDF.select(
+ from_protobuf_wrapper($"binary", name, descFilePathOpt,
options).as("empty_proto")
+ )
+ // Top level empty message is retained without adding dummy column to
the schema.
+ assert(df.schema == structFromDDL("empty_proto struct<>"))
Review Comment:
Why is that? Isn't it better to be consistent? "If there is an empty
protobuf, it will have a dummy column".
The users don't distinguish between top level vs nested.
Btw, does parquet test below include empty top level struct? Could you point
to it?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]