HyukjinKwon commented on a change in pull request #25195:
[SPARK-28288][SQL][PYTHON][TESTS] Convert and port 'window.sql' into UDF test
base
URL: https://github.com/apache/spark/pull/25195#discussion_r306589637
##########
File path: sql/core/src/test/resources/sql-tests/results/udf/udf-window.sql.out
##########
@@ -0,0 +1,389 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 23
+
+
+-- !query 0
+CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
+(null, 1L, 1.0D, date("2017-08-01"), timestamp(1501545600), "a"),
+(1, 1L, 1.0D, date("2017-08-01"), timestamp(1501545600), "a"),
+(1, 2L, 2.5D, date("2017-08-02"), timestamp(1502000000), "a"),
+(2, 2147483650L, 100.001D, date("2020-12-31"), timestamp(1609372800), "a"),
+(1, null, 1.0D, date("2017-08-01"), timestamp(1501545600), "b"),
+(2, 3L, 3.3D, date("2017-08-03"), timestamp(1503000000), "b"),
+(3, 2147483650L, 100.001D, date("2020-12-31"), timestamp(1609372800), "b"),
+(null, null, null, null, null, null),
+(3, 1L, 1.0D, date("2017-08-01"), timestamp(1501545600), null)
+AS testData(val, val_long, val_double, val_date, val_timestamp, cate)
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+SELECT udf(val), cate, count(val) OVER(PARTITION BY cate ORDER BY val ROWS
CURRENT ROW) FROM testData
+ORDER BY cate, val
+-- !query 1 schema
+struct<CAST(udf(cast(val as string)) AS INT):int,cate:string,count(val) OVER
(PARTITION BY cate ORDER BY val ASC NULLS FIRST ROWS BETWEEN CURRENT ROW AND
CURRENT ROW):bigint>
+-- !query 1 output
+NULL NULL 0
+3 NULL 1
+NULL a 0
+1 a 1
+1 a 1
+2 a 1
+1 b 1
+2 b 1
+3 b 1
+
+
+-- !query 2
+SELECT udf(val), cate, sum(val) OVER(PARTITION BY cate ORDER BY val
+ROWS BETWEEN UNBOUNDED PRECEDING AND 1 FOLLOWING) FROM testData ORDER BY cate,
val
+-- !query 2 schema
+struct<CAST(udf(cast(val as string)) AS INT):int,cate:string,sum(val) OVER
(PARTITION BY cate ORDER BY val ASC NULLS FIRST ROWS BETWEEN UNBOUNDED
PRECEDING AND 1 FOLLOWING):bigint>
+-- !query 2 output
+NULL NULL 3
+3 NULL 3
+NULL a 1
+1 a 2
+1 a 4
+2 a 4
+1 b 3
+2 b 6
+3 b 6
+
+
+-- !query 3
Review comment:
Hmm ...
```diff
-- !query 3
-SELECT val_long, cate, sum(val_long) OVER(PARTITION BY cate ORDER BY
val_long
-ROWS BETWEEN CURRENT ROW AND 2147483648 FOLLOWING) FROM testData ORDER BY
cate, val_long
+SELECT val_long, udf(cate), sum(val_long) OVER(PARTITION BY cate ORDER BY
val_long
+ROWS BETWEEN CURRENT ROW AND CAST(2147483648 AS int) FOLLOWING) FROM
testData ORDER BY cate, val_long
-- !query 3 schema
-struct<>
+struct<val_long:bigint,CAST(udf(cast(cate as string)) AS
STRING):string,sum(val_long) OVER (PARTITION BY cate ORDER BY val_long ASC
NULLS FIRST ROWS BETWEEN CURRENT ROW AND CAST(2147483648 AS INT)
FOLLOWING):bigint>
-- !query 3 output
-org.apache.spark.sql.AnalysisException
-cannot resolve 'ROWS BETWEEN CURRENT ROW AND 2147483648L FOLLOWING' due to
data type mismatch: The data type of the upper bound 'bigint' does not match
the expected data type 'int'.; line 1 pos 41
+NULL NULL 1
+1 NULL 1
+1 a 2147483654
+1 a 2147483653
+2 a 2147483652
+2147483650 a 2147483650
+NULL b 2147483653
+3 b 2147483653
+2147483650 b 2147483650
```
Do you know why this works when it's wrapped by udf?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]