chenhao-db opened a new pull request, #45479:
URL: https://github.com/apache/spark/pull/45479
### What changes were proposed in this pull request?
This PR adds the implementation of the `parse_json` expression to replace
the current fake implementation. This expression parses a JSON string as a
variant value. It throws an exception when the string is not a valid JSON value
or when the resulting variant cannot fit into the size limit.
This feature is achieved by introducing a library`common/variant`. It
includes the utility functions to build and manipulate binary-encoded variant
values. It also contains a `README` that describes the variant binary format.
It is intended that this library can be used outside of Spark.
Some usage examples of `parse_json`:
```
select parse_json('{"a": 1, "b": 2}');
create table variant_table as select parse_json(j) as v from json_table;
select parse_json('[');
-- will throw an exception because the input is not valid JSON
select parse_json('"' || repeat('a', 16 * 1024 * 1024) || '"');
-- will throw an exception because the variant exceeds the size limit
```
### Does this PR introduce _any_ user-facing change?
Yes, the `parse_json` expression will return an actual binary-encoded
variant value rather than the original placeholder value.
### How was this patch tested?
Unit tests that validate the `parse_json` result. Negative cases where the
expression fail on invalid/large JSON are also covered.
Some unit tests need to be temporarily disabled because the `toString`
implementation doesn't match the `parse_json` implementation yet. I will
shortly add a new `toString` implementation and re-enable them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]