rangadi opened a new pull request, #40141:
URL: https://github.com/apache/spark/pull/40141
### What changes were proposed in this pull request?
Protobuf deserializer (`from_protobuf()` function()) optionally supports
recursive fields up to certain depth. Currently it uses `NullType` to terminate
the recursion. But an `ArrayType` containing `NullType` is not really useful
and it does not work delta.
This PR fixes this by removing the field to terminate recursion rather than
using `NullType`.
The following example illustrates the difference.
E.g. Consider a recursive Protobuf like this:
```
message Node {
int value = 1;
repeated Node children = 2 // recursive array
}
message Tree {
Node root = 1
}
```
Catalyst schama with `from_protobuf()` of `Tree` with max recursive depth
set to 2, would be:
- **Before**: _STRUCT<root: STRUCT<value: int, children:
array<STRUCT<value: int, **children: array<void>**>>>>_
- **After**: _STRUCT<root: STRUCT<value: int, children:
array<STRUCT<value: int>>>>_
Notice that at second level, the `children` array is dropped, rather than
being defined as `array<void>`.
### Why are the changes needed?
- This improves how Protobuf connector handles recursive fields. It avoids
using `void` fields which are problematic in many scenarios and do not add any
information.
### Does this PR introduce _any_ user-facing change?
- This changes the schema in a subtle manner while using with recursive
support enabled. Since this only removes an optional field, it is backward
compatible.
### How was this patch tested?
- Added multiple unit tests and updated existing one. Most of the changes
for this PR are in the tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]