Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22375#discussion_r218686614
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelperSuite.scala
---
@@ -35,6 +36,13 @@ class ExpressionEvalHelperSuite extends SparkFunSuite
with ExpressionEvalHelper
val e = intercept[RuntimeException] {
checkEvaluation(BadCodegenExpression(), 10) }
assert(e.getMessage.contains("some_variable"))
}
+
+ test("SPARK-25388: checkEvaluation should fail if nullable in DataType
is incorrect") {
+ val e = intercept[RuntimeException] {
+ checkEvaluation(MapIncorrectDataTypeExpression(), Map(3 -> 7, 6 ->
null))
--- End diff --
Sorry, you're right, I have not been precise in my previous statements so
they are confusing. I'll try to elaborate more clearly.
IIUC what you are proposing now is for detecting the case when the output
datatype of an expression is not nullable and the expected output instead
contains a null. The way you're trying to achieve this is to make always
nullable the expected output, so that we get a null as output both for codegen
on and codegen off (while before this change for codegen on we would get as
expected output a wrong value, which is the default value for that data type).
The problems I see on the approach above are:
- if you test only codegen on or codegen off, you might still be missing
to cover some cases (eg. see
https://github.com/apache/spark/pull/22375#discussion_r218085246, with codegen
off we can have null as output for a non-nullable expression...);
- The failure you get is that the expected output is different from the
the evaluated one, so this case is treated as any other failure in the
expression evaluation.
With something like this:
```
assert(containsNull(expected) && isNullable(expression.dataType))
```
I think we could solve the 2 problems above as:
- this check would fail in both modes;
- we can differentiate between a test failing because of a wrong output
and this new case which covers: a bad written UT; an expression returning
wrongly a non-nullable datatype instead of a nullable one when returning null.
Hope now it is more clear. Thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]