GitHub user kiszk opened a pull request:

    https://github.com/apache/spark/pull/22375

    [WIP][SPARK-25388][Test] Detect incorrect nullable of DataType in the result

    ## What changes were proposed in this pull request?
    
    This PR can correctly cause assertion failure when incorrect nullable of 
DataType in the result is generated by a target function to be tested.
    
    Let us think the following example. In the future, a developer would write 
incorrect code that returns unexpected result. We have to correctly cause fail 
in this test since `valueContainsNull=false` while `expr` includes `null`. 
However, without this PR, this test passes. This PR can correctly cause fail.
    
    
    ```
    test("test TARGETFUNCTON") {
      val expr = TARGETMAPFUNCTON()
      // expr = UnsafeMap(3 -> 6, 7 -> null)
      // expr.dataType = (IntegerType, IntegerType, false)
    
      expected = Map(3 -> 6, 7 -> null)
      checkEvaluation(expr, expected)
    ```
    
    In 
[`checkEvaluationWithUnsafeProjection`](https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala#L208-L235),
 the results are compared using `UnsafeRow`. When the given `expected` is 
[converted](https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala#L226-L227))
 to `UnsafeRow` using the `DataType` of `expr`.
    ```
    val expectedRow = UnsafeProjection.create(Array(expression.dataType, 
expression.dataType)).apply(lit)
    ```
    
    In summary, `expr` is 
`[0,1800000038,5000000038,18,2,0,700000003,2,0,6,18,2,0,700000003,2,0,6]` with 
and w/o this PR. `expected` is converted to
    
    * w/o  this PR, 
`[0,1800000038,5000000038,18,2,0,700000003,2,0,6,18,2,0,700000003,2,0,6]`
    * with this PR, 
`[0,1800000038,5000000038,18,2,0,700000003,2,2,6,18,2,0,700000003,2,2,6]`
    
    As a result, w/o this PR, the test unexpectedly passes.
    
    
    This is because, w/o this PR, based on given `dataType`, generated code of 
projection for `expected` avoids to set nullbit.
    ```
    /* 155 */           for (int index_1 = 0; index_1 < numElements_1; 
index_1++) {
    /* 156 */             mutableStateArray_1[1].write(index_1, 
tmpInput_2.getInt(index_1));
    /* 157 */           }
    ```
    
    With this PR, generated code of projection for `expected` always checks 
whether nullbit should be set by `isNullAt`
    ```
    /* 161 */           for (int index_1 = 0; index_1 < numElements_1; 
index_1++) {
    /* 162 */
    /* 163 */             if (tmpInput_2.isNullAt(index_1)) {
    /* 164 */               mutableStateArray_1[1].setNull4Bytes(index_1);
    /* 165 */             } else {
    /* 166 */               mutableStateArray_1[1].write(index_1, 
tmpInput_2.getInt(index_1));
    /* 167 */             }
    /* 168 */
    /* 169 */           }
    ```
    
    ## How was this patch tested?
    
    Existing UTs

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-25388

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22375.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22375
    
----
commit 48e47b28d089172efd804b44a694eabd40d03bb9
Author: Kazuaki Ishizaki <ishizaki@...>
Date:   2018-09-10T01:49:56Z

    initial commit

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to