GitHub user kiszk opened a pull request:
https://github.com/apache/spark/pull/22375
[WIP][SPARK-25388][Test] Detect incorrect nullable of DataType in the result
## What changes were proposed in this pull request?
This PR can correctly cause assertion failure when incorrect nullable of
DataType in the result is generated by a target function to be tested.
Let us think the following example. In the future, a developer would write
incorrect code that returns unexpected result. We have to correctly cause fail
in this test since `valueContainsNull=false` while `expr` includes `null`.
However, without this PR, this test passes. This PR can correctly cause fail.
```
test("test TARGETFUNCTON") {
val expr = TARGETMAPFUNCTON()
// expr = UnsafeMap(3 -> 6, 7 -> null)
// expr.dataType = (IntegerType, IntegerType, false)
expected = Map(3 -> 6, 7 -> null)
checkEvaluation(expr, expected)
```
In
[`checkEvaluationWithUnsafeProjection`](https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala#L208-L235),
the results are compared using `UnsafeRow`. When the given `expected` is
[converted](https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala#L226-L227))
to `UnsafeRow` using the `DataType` of `expr`.
```
val expectedRow = UnsafeProjection.create(Array(expression.dataType,
expression.dataType)).apply(lit)
```
In summary, `expr` is
`[0,1800000038,5000000038,18,2,0,700000003,2,0,6,18,2,0,700000003,2,0,6]` with
and w/o this PR. `expected` is converted to
* w/o this PR,
`[0,1800000038,5000000038,18,2,0,700000003,2,0,6,18,2,0,700000003,2,0,6]`
* with this PR,
`[0,1800000038,5000000038,18,2,0,700000003,2,2,6,18,2,0,700000003,2,2,6]`
As a result, w/o this PR, the test unexpectedly passes.
This is because, w/o this PR, based on given `dataType`, generated code of
projection for `expected` avoids to set nullbit.
```
/* 155 */ for (int index_1 = 0; index_1 < numElements_1;
index_1++) {
/* 156 */ mutableStateArray_1[1].write(index_1,
tmpInput_2.getInt(index_1));
/* 157 */ }
```
With this PR, generated code of projection for `expected` always checks
whether nullbit should be set by `isNullAt`
```
/* 161 */ for (int index_1 = 0; index_1 < numElements_1;
index_1++) {
/* 162 */
/* 163 */ if (tmpInput_2.isNullAt(index_1)) {
/* 164 */ mutableStateArray_1[1].setNull4Bytes(index_1);
/* 165 */ } else {
/* 166 */ mutableStateArray_1[1].write(index_1,
tmpInput_2.getInt(index_1));
/* 167 */ }
/* 168 */
/* 169 */ }
```
## How was this patch tested?
Existing UTs
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kiszk/spark SPARK-25388
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22375.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22375
----
commit 48e47b28d089172efd804b44a694eabd40d03bb9
Author: Kazuaki Ishizaki <ishizaki@...>
Date: 2018-09-10T01:49:56Z
initial commit
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]