rangadi commented on code in PR #40011:
URL: https://github.com/apache/spark/pull/40011#discussion_r1105466945
##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala:
##########
@@ -385,34 +387,12 @@ class ProtobufFunctionsSuite extends QueryTest with
SharedSparkSession with Prot
}
}
- test("Handle recursive fields in Protobuf schema, A->B->A") {
- val schemaA = ProtobufUtils.buildDescriptor(testFileDesc, "recursiveA")
Review Comment:
This extra code to create actual records is not required since we are
testing an AnalysisException. It does not need real data.
##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala:
##########
@@ -118,17 +118,17 @@ object SchemaConverters {
// discarded.
// SQL Schema for the protobuf message `message Person { string name =
1; Person bff = 2}`
// will vary based on the value of "recursive.fields.max.depth".
- // 0: struct<name: string, bff: null>
- // 1: struct<name string, bff: <name: string, bff: null>>
- // 2: struct<name string, bff: <name: string, bff: struct<name:
string, bff: null>>> ...
+ // 1: struct<name: string, bff: null>
+ // 2: struct<name string, bff: <name: string, bff: null>>
+ // 3: struct<name string, bff: <name: string, bff: struct<name:
string, bff: null>>> ...
val recordName = fd.getMessageType.getFullName
val recursiveDepth = existingRecordNames.getOrElse(recordName, 0)
val recursiveFieldMaxDepth = protobufOptions.recursiveFieldMaxDepth
- if (existingRecordNames.contains(recordName) &&
(recursiveFieldMaxDepth < 0 ||
+ if (existingRecordNames.contains(recordName) &&
(recursiveFieldMaxDepth <= 0 ||
recursiveFieldMaxDepth > 10)) {
throw
QueryCompilationErrors.foundRecursionInProtobufSchema(fd.toString())
} else if (existingRecordNames.contains(recordName) &&
- recursiveDepth > recursiveFieldMaxDepth) {
+ recursiveDepth >= recursiveFieldMaxDepth) {
Review Comment:
The main bug fix:
We allowed recursive depth go over by one. This makes the setting of '0'
does not work as documented (earlier documentation stated even the first
occurrence would be removed it it was set to 0).
Removing the first occurrence does not make sense since we don't know if
this field is recursive until the second occurrence.
##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala:
##########
@@ -853,7 +824,7 @@ class ProtobufFunctionsSuite extends QueryTest with
SharedSparkSession with Prot
val df = Seq(employee.toByteArray).toDF("protoEvent")
val options = new java.util.HashMap[String, String]()
- options.put("recursive.fields.max.depth", "1")
+ options.put("recursive.fields.max.depth", "2")
Review Comment:
We need to increase the setting because this PR fixes off by one with
enforcement.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]