GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/22375
[WIP][SPARK-25388][Test] Detect incorrect nullable of DataType in the result ## What changes were proposed in this pull request? This PR can correctly cause assertion failure when incorrect nullable of DataType in the result is generated by a target function to be tested. Let us think the following example. In the future, a developer would write incorrect code that returns unexpected result. We have to correctly cause fail in this test since `valueContainsNull=false` while `expr` includes `null`. However, without this PR, this test passes. This PR can correctly cause fail. ``` test("test TARGETFUNCTON") { val expr = TARGETMAPFUNCTON() // expr = UnsafeMap(3 -> 6, 7 -> null) // expr.dataType = (IntegerType, IntegerType, false) expected = Map(3 -> 6, 7 -> null) checkEvaluation(expr, expected) ``` In [`checkEvaluationWithUnsafeProjection`](https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala#L208-L235), the results are compared using `UnsafeRow`. When the given `expected` is [converted](https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala#L226-L227)) to `UnsafeRow` using the `DataType` of `expr`. ``` val expectedRow = UnsafeProjection.create(Array(expression.dataType, expression.dataType)).apply(lit) ``` In summary, `expr` is `[0,1800000038,5000000038,18,2,0,700000003,2,0,6,18,2,0,700000003,2,0,6]` with and w/o this PR. `expected` is converted to * w/o this PR, `[0,1800000038,5000000038,18,2,0,700000003,2,0,6,18,2,0,700000003,2,0,6]` * with this PR, `[0,1800000038,5000000038,18,2,0,700000003,2,2,6,18,2,0,700000003,2,2,6]` As a result, w/o this PR, the test unexpectedly passes. This is because, w/o this PR, based on given `dataType`, generated code of projection for `expected` avoids to set nullbit. ``` /* 155 */ for (int index_1 = 0; index_1 < numElements_1; index_1++) { /* 156 */ mutableStateArray_1[1].write(index_1, tmpInput_2.getInt(index_1)); /* 157 */ } ``` With this PR, generated code of projection for `expected` always checks whether nullbit should be set by `isNullAt` ``` /* 161 */ for (int index_1 = 0; index_1 < numElements_1; index_1++) { /* 162 */ /* 163 */ if (tmpInput_2.isNullAt(index_1)) { /* 164 */ mutableStateArray_1[1].setNull4Bytes(index_1); /* 165 */ } else { /* 166 */ mutableStateArray_1[1].write(index_1, tmpInput_2.getInt(index_1)); /* 167 */ } /* 168 */ /* 169 */ } ``` ## How was this patch tested? Existing UTs You can merge this pull request into a Git repository by running: $ git pull https://github.com/kiszk/spark SPARK-25388 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22375.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22375 ---- commit 48e47b28d089172efd804b44a694eabd40d03bb9 Author: Kazuaki Ishizaki <ishizaki@...> Date: 2018-09-10T01:49:56Z initial commit ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org