Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20637#discussion_r212805339
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
---
@@ -223,8 +223,9 @@ trait ExpressionEvalHelper extends
GeneratorDrivenPropertyChecks with PlanTestBa
}
} else {
val lit = InternalRow(expected, expected)
+ val dtAsNullable = expression.dataType.asNullable
--- End diff --
I am sorry that I made a mistake to explain somewhere. Let me explain this
again.
`MapZipWith` without #22126 was incorrect since it generated
`DataType(MapType(IntegerType, IntegerType, valueContainsNull = false))` for
the result `Map(1 -> -10, 2 -> -80, 3 -> null, 4 -> null)`. This problem was
fixed by #22126. Such a mistake will be deteched if the same issue would occur
in the future.
Now, we assume that another function `A` would generate
`DataType(MapType(IntegerType, IntegerType, valueContainsNull = false))` for
the result `Map(1 -> null)` if the expected result is `Map(1 -> null)`.
`ExpressionEvalHelper.checkEvaluationWithUnsafeProjection()` generates
`Unsafe...` for `expected`. If this PR (#20367) is not applied, the generated
projection code always inserts `isNullAt()`. Therefore, the `UnsafeMapData(1 ->
null)` is generated while `DataType(MapType(IntegerType, IntegerType,
valueContainsNull = false))`.
As you know `ExpressionEvalHelper.checkEvaluationWithUnsafeProjection()`
uses `DataType` of `expression` when `Unsafe...` for `expected` is generated.
This PR (#20367) removes `isNullAt()` from the generated projection code if
`DataType.nullable` is `false`. If `Map(1 -> null)` with
`DataType(MapType(IntegerType, IntegerType, valueContainsNull = false))` will
be converted to `UnsafeArrayData` for `expect`, the test is succeeded
**unexpectedly**. This is because `nullbits` in `UnsafeMapData` for `expect`
is not set. This `UnsafeMapData` is equal to the `UnsafeMapData` in
`expression`.
If `expression.dataType.asNullable` is applied, the generated projection
code for `expected` has `isNullAt()`. Thus, while `expression.dataType` is
incorrect, the `expected` is correctly converted into `Unsafe...` with `null`
elements.
The **unexpected** test pass can be recreated without #22126 and with
#20367 by removing `asNullable`.
The **correct** test failure can be produced without #22126 and with #20367
by keeping `asNullable` as follows:
```
org.scalatest.exceptions.TestFailedException: Incorrect evaluation in
unsafe mode: map_zip_with(keys: [1,2,3], values: [10,20,30], keys: [1,2,4],
values: [-1,-2,-4], lambdafunction(((lambda arg1#10 * lambda arg2#11) * lambda
arg3#12), lambda arg1#10, lambda arg2#11, lambda arg3#12, false)), actual:
[0,1800000048,6000000048,20,4,0,200000001,400000003,4,0,ffffffb0fffffff6,0,20,4,0,200000001,400000003,4,0,ffffffb0fffffff6,0],
expected:
[0,1800000048,6000000048,20,4,0,200000001,400000003,4,c,ffffffb0fffffff6,0,20,4,0,200000001,400000003,4,c,ffffffb0fffffff6,0]
```
I would appreciate it if you would have better solution.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]