MaxGekk opened a new pull request #29311:
URL: https://github.com/apache/spark/pull/29311
### What changes were proposed in this pull request?
Convert `NULL` elements of maps, structs and arrays to the `"null"` string
while converting maps/struct/array values to strings.
### Why are the changes needed?
1. It is impossible to distinguish empty string and null, for instance:
```scala
scala> Seq(Seq(""), Seq(null)).toDF().show
+-----+
|value|
+-----+
| []|
| []|
+-----+
```
2. Inconsistent NULL conversions for top-level values and nested columns,
for instance:
```scala
scala> sql("select named_struct('c', null), null").show
+---------------------+----+
|named_struct(c, NULL)|NULL|
+---------------------+----+
| []|null|
+---------------------+----+
```
3. `.show()` is different from conversions to Hive strings, and as a
consequence its output is different from `spark-sql` (sql tests):
```sql
spark-sql> select named_struct('c', null) as struct;
{"c":null}
```
```scala
scala> sql("select named_struct('c', null) as struct").show
+------+
|struct|
+------+
| []|
+------+
```
### Does this PR introduce _any_ user-facing change?
Yes, before:
```scala
scala> Seq(Seq(""), Seq(null)).toDF().show
+-----+
|value|
+-----+
| []|
| []|
+-----+
```
After:
```scala
scala> Seq(Seq(""), Seq(null)).toDF().show
+------+
| value|
+------+
| []|
|[null]|
+------+
```
### How was this patch tested?
By existing test suite `CastSuite`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]