huaxingao commented on a change in pull request #25101:
[SPARK-28277][SQL][PYTHON][TESTS] Convert and port 'except.sql' into UDF test
base
URL: https://github.com/apache/spark/pull/25101#discussion_r303188362
##########
File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except.sql
##########
@@ -0,0 +1,58 @@
+-- This test file was converted from except.sql.
+-- Tests different scenarios of except operation
+create temporary view t1 as select * from values
+ ("one", 1),
+ ("two", 2),
+ ("three", 3),
+ ("one", NULL)
+ as t1(k, v);
+
+create temporary view t2 as select * from values
+ ("one", 1),
+ ("two", 22),
+ ("one", 5),
+ ("one", NULL),
+ (NULL, 5)
+ as t2(k, v);
+
+
+-- Except operation that will be replaced by left anti join
+SELECT * FROM t1 EXCEPT SELECT * FROM t2;
+
+
+-- Except operation that will be replaced by Filter: SPARK-22181
+SELECT * FROM t1 EXCEPT SELECT * FROM t1 where udf(v) <> 1 and v <> udf(2);
+
+
+-- Except operation that will be replaced by Filter: SPARK-22181
+SELECT * FROM t1 where udf(v) <> 1 and v <> udf(22) EXCEPT SELECT * FROM t1
where udf(v) <> 2 and v >= udf(3);
+
+
+-- Except operation that will be replaced by Filter: SPARK-22181
+SELECT t1.* FROM t1, t2 where t1.k = t2.k
+EXCEPT
+SELECT t1.* FROM t1, t2 where t1.k = t2.k and t1.k != udf('one');
+
+
+-- Except operation that will be replaced by left anti join
+SELECT * FROM t2 where v >= udf(1) and udf(v) <> 22 EXCEPT SELECT * FROM t1;
+
+
+-- Except operation that will be replaced by left anti join
+SELECT (SELECT min(udf(k)) FROM t2 WHERE t2.k = t1.k) min_t2 FROM t1
+MINUS
+SELECT (SELECT udf(min(k)) FROM t2) abs_min_t2 FROM t1 WHERE t1.k =
udf('one');
+
+
+-- Except operation that will be replaced by left anti join
+SELECT t1.k
+FROM t1
+WHERE t1.v <= (SELECT max(udf(t2.v))
Review comment:
I did some more tests on this ```max```. Here are the results:
```
-- !query 10
SELECT max(t2.v) FROM t2
-- !query 10 schema
struct<max(v):int>
-- !query 10 output
22
-- !query 11
SELECT udf(t2.v) FROM t2
-- !query 11 schema
struct<udf(v):string>
-- !query 11 output
1
22
5.0
5.0
nan
-- !query 12
SELECT max(udf(t2.v)) FROM t2
-- !query 12 schema
struct<max(udf(v)):string>
-- !query 12 output
nan
-- !query 13
SELECT udf(max(t2.v)) FROM t2
-- !query 13 schema
struct<udf(max(v)):string>
-- !query 13 output
22
-- !query 14
SELECT udf(max(udf(t2.v))) FROM t2
-- !query 14 schema
struct<udf(max(udf(v))):string>
-- !query 14 output
nan
-- !query 15
SELECT *
FROM t1
WHERE t1.v <= (SELECT max(t2.v)
FROM t2
WHERE t2.k = t1.k)
-- !query 15 schema
struct<k:string,v:int>
-- !query 15 output
one 1
two 2
-- !query 16
SELECT *
FROM t1
WHERE t1.v <= (SELECT udf(max(t2.v))
FROM t2
WHERE t2.k = t1.k)
-- !query 16 schema
struct<>
-- !query 16 output
java.lang.UnsupportedOperationException
Cannot evaluate expression: udf(null)
-- !query 17
SELECT *
FROM t1
WHERE t1.v <= (SELECT max(udf(t2.v))
FROM t2
WHERE t2.k = t1.k)
-- !query 17 schema
struct<k:string,v:int>
-- !query 17 output
two 2
-- !query 18
SELECT *
FROM t1
WHERE t1.v <= (SELECT udf(max(udf(t2.v)))
FROM t2
WHERE t2.k = t1.k)
-- !query 18 schema
struct<>
-- !query 18 output
java.lang.UnsupportedOperationException
Cannot evaluate expression: udf(null)
```
I initially thought ```max(t2.v) returns 22``` is the right behavior, but
after I looked the implementation of ```functions.max(e: Column)```, I am not
sure any more. This ```functions.max(e: Column)``` eventually calls this
```nanSafeCompareDoubles```, and it treats NaN greater than any non-NaN
double. So ```functions.max(t2.v)``` returns ```NaN```. Is this right?
Should we let ```max(t2.v) returns 22``` instead of ```NaN```?
```
/**
* NaN-safe version of `java.lang.Double.compare()` which allows NaN
values to be compared
* according to semantics where NaN == NaN and NaN is greater than any
non-NaN double.
*/
def nanSafeCompareDoubles(x: Double, y: Double): Int = {
val xIsNan: Boolean = java.lang.Double.isNaN(x)
val yIsNan: Boolean = java.lang.Double.isNaN(y)
if ((xIsNan && yIsNan) || (x == y)) 0
else if (xIsNan) 1
else if (yIsNan) -1
else if (x > y) 1
else -1
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]