rangadi opened a new pull request, #40141:
URL: https://github.com/apache/spark/pull/40141

   ### What changes were proposed in this pull request?
   
   Protobuf deserializer (`from_protobuf()` function()) optionally supports 
recursive fields up to certain depth. Currently it uses `NullType` to terminate 
the recursion. But an `ArrayType` containing `NullType` is not really useful 
and  it does not work delta.
   
   This PR fixes this by removing the field to terminate recursion rather than 
using `NullType`. 
   The following example illustrates the difference. 
   
   E.g. Consider a recursive Protobuf like this:
   ```
   message Node {
       int value = 1;
       repeated Node children = 2  // recursive array
   }
   message Tree {
       Node root = 1
   }
   ```
   Catalyst schama with `from_protobuf()` of `Tree` with max recursive depth 
set to 2, would be:
    
      - **Before**:  _STRUCT<root: STRUCT<value: int, children: 
array<STRUCT<value: int, **children: array<void>**>>>>_
      - **After**: _STRUCT<root: STRUCT<value: int, children: 
array<STRUCT<value: int>>>>_
   
   Notice that at second level, the `children` array is dropped, rather than 
being defined as `array<void>`. 
   
   ### Why are the changes needed?
    - This improves how Protobuf connector handles recursive fields. It avoids 
using `void` fields which are problematic in many scenarios and do not add any 
information.
   
   ### Does this PR introduce _any_ user-facing change?
    - This changes the schema in a subtle manner while using with recursive 
support enabled. Since this only removes an optional field, it is backward 
compatible. 
   
   ### How was this patch tested?
    - Added multiple unit tests and updated existing one. Most of the changes 
for this PR are in the tests. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to