Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20637#discussion_r212805339
  
    --- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
 ---
    @@ -223,8 +223,9 @@ trait ExpressionEvalHelper extends 
GeneratorDrivenPropertyChecks with PlanTestBa
               }
             } else {
               val lit = InternalRow(expected, expected)
    +          val dtAsNullable = expression.dataType.asNullable
    --- End diff --
    
    I am sorry that I made a mistake to explain somewhere. Let me explain this 
again.
    
    `MapZipWith` without #22126 was incorrect since it generated 
`DataType(MapType(IntegerType, IntegerType, valueContainsNull = false))` for 
the result `Map(1 -> -10, 2 -> -80, 3 -> null, 4 -> null)`. This problem was 
fixed by #22126. Such a mistake will be deteched if the same issue would occur 
in the future.
    
    Now, we assume that another function `A` would generate 
`DataType(MapType(IntegerType, IntegerType, valueContainsNull = false))` for 
the result `Map(1 -> null)` if the expected result is `Map(1 -> null)`.   
    `ExpressionEvalHelper.checkEvaluationWithUnsafeProjection()` generates 
`Unsafe...` for `expected`. If this PR (#20367) is not applied, the generated 
projection code always inserts `isNullAt()`. Therefore, the `UnsafeMapData(1 -> 
null)` is generated while `DataType(MapType(IntegerType, IntegerType, 
valueContainsNull = false))`. 
     
    As you know `ExpressionEvalHelper.checkEvaluationWithUnsafeProjection()` 
uses `DataType` of `expression` when `Unsafe...` for `expected` is generated. 
This PR (#20367) removes `isNullAt()` from the generated projection code if 
`DataType.nullable` is `false`. If `Map(1 -> null)` with 
`DataType(MapType(IntegerType, IntegerType, valueContainsNull = false))` will 
be converted to `UnsafeArrayData` for `expect`, the test is succeeded 
**unexpectedly**. This is because `nullbits` in  `UnsafeMapData` for `expect` 
is not set. This `UnsafeMapData` is equal to the `UnsafeMapData` in 
`expression`.
    
    If `expression.dataType.asNullable` is applied, the generated projection 
code for `expected` has `isNullAt()`. Thus, while `expression.dataType` is 
incorrect, the `expected` is correctly converted into `Unsafe...` with `null` 
elements.
    
    The **unexpected** test pass can be recreated without #22126 and with 
#20367 by removing `asNullable`.
    
    The **correct** test failure can be produced without #22126 and with #20367 
by keeping `asNullable` as follows:
    
    ```
    org.scalatest.exceptions.TestFailedException: Incorrect evaluation in 
unsafe mode: map_zip_with(keys: [1,2,3], values: [10,20,30], keys: [1,2,4], 
values: [-1,-2,-4], lambdafunction(((lambda arg1#10 * lambda arg2#11) * lambda 
arg3#12), lambda arg1#10, lambda arg2#11, lambda arg3#12, false)), actual: 
[0,1800000048,6000000048,20,4,0,200000001,400000003,4,0,ffffffb0fffffff6,0,20,4,0,200000001,400000003,4,0,ffffffb0fffffff6,0],
 expected: 
[0,1800000048,6000000048,20,4,0,200000001,400000003,4,c,ffffffb0fffffff6,0,20,4,0,200000001,400000003,4,c,ffffffb0fffffff6,0]
    ```
    
    I would appreciate it if you would have better solution.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to