bersprockets opened a new pull request, #39970:
URL: https://github.com/apache/spark/pull/39970
### What changes were proposed in this pull request?
In the `DataType` instance returned by `ArrayInsert#dataType` and
`ArrayAppend#dataType`, set `containsNull` to true if either
- the input array has `containsNull` set to true
- the expression to be inserted/appended is nullable.
### Why are the changes needed?
The following two queries return the wrong answer:
```
spark-sql> select array_insert(array(1, 2, 3, 4), 5, cast(null as int));
[1,2,3,4,0] <== should be [1,2,3,4,null]
Time taken: 3.879 seconds, Fetched 1 row(s)
spark-sql> select array_append(array(1, 2, 3, 4), cast(null as int));
[1,2,3,4,0] <== should be [1,2,3,4,null]
Time taken: 0.068 seconds, Fetched 1 row(s)
spark-sql>
```
The following two queries throw a `NullPointerException`:
```
spark-sql> select array_insert(array('1', '2', '3', '4'), 5, cast(null as
string));
23/02/10 11:24:59 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)
java.lang.NullPointerException
at
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown
Source)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
Source)
...
spark-sql> select array_append(array('1', '2', '3', '4'), cast(null as
string));
23/02/10 11:25:10 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
java.lang.NullPointerException
at
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown
Source)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
Source)
...
spark-sql>
```
The bug arises because both `ArrayInsert` and `ArrayAppend` use the first
child's data type as the function's data type. That is, it uses the first
child's `containsNull` setting, regardless of whether the insert/append
operation might produce an array containing a null value.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
New unit tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]