HyukjinKwon commented on issue #25098: [SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test base URL: https://github.com/apache/spark/pull/25098#issuecomment-512770020 @skonto, do you know why we have such diff? ```diff -- !query 2 -SELECT COUNT(udf(a)), udf(COUNT(b)) FROM testData +SELECT COUNT(a), COUNT(b) FROM testData -- !query 2 schema -struct<count(udf(a)):bigint,udf(count(b)):string> +struct<count(a):bigint,count(b):bigint> -- !query 2 output -9 7 +7 7 ``` ```diff -- !query 3 -SELECT CAST(udf(a) as int), COUNT(udf(b)) FROM testData GROUP BY a +SELECT a, COUNT(b) FROM testData GROUP BY a -- !query 3 schema -struct<CAST(udf(a) AS INT):int,count(udf(b)):bigint> +struct<a:int,count(b):bigint> -- !query 3 output 1 2 2 2 -3 3 -NULL 2 +3 2 +NULL 1 ``` ```diff -- !query 5 -SELECT COUNT(udf(a)), COUNT(udf(b)) FROM testData GROUP BY udf(a) +SELECT COUNT(a), COUNT(b) FROM testData GROUP BY a -- !query 5 schema -struct<count(udf(a)):bigint,count(udf(b)):bigint> +struct<count(a):bigint,count(b):bigint> -- !query 5 output +0 1 2 2 2 2 -2 2 -3 3 +3 2 ``` ```diff -- !query 6 -SELECT 'foo', COUNT(udf(a)) FROM testData GROUP BY 1 +SELECT 'foo', COUNT(a) FROM testData GROUP BY 1 -- !query 6 schema -struct<foo:string,count(udf(a)):bigint> +struct<foo:string,count(a):bigint> -- !query 6 output -foo 9 +foo 7 ``` ```diff -- !query 13 -SELECT SKEWNESS(udf(a)), udf(KURTOSIS(a)), udf(MIN(a)), CAST(MAX(udf(a)) as int), udf(AVG(udf(a))), udf(VARIANCE(a)), STDDEV(udf(a)), udf(SUM(a)), udf(COUNT(a)) +SELECT SKEWNESS(a), KURTOSIS(a), MIN(a), MAX(a), AVG(a), VARIANCE(a), STDDEV(a), SUM(a), COUNT(a) FROM testData -- !query 13 schema -struct<skewness(CAST(udf(a) AS DOUBLE)):double,udf(kurtosis(cast(a as double))):string,udf(min(a)):string,CAST(max(udf(a)) AS INT):int,udf(avg(cast(udf(a) as double))):string,udf(var_samp(cast(a as double))):string,stddev_samp(CAST(udf(a) AS DOUBLE)):double,udf(sum(cast(a as bigint))):string,udf(count(a)):string> +struct<skewness(CAST(a AS DOUBLE)):double,kurtosis(CAST(a AS DOUBLE)):double,min(a):int,max(a):int,avg(a):double,var_samp(CAST(a AS DOUBLE)):double,stddev_samp(CAST(a AS DOUBLE)):double,sum(a):bigint,count(a):bigint> -- !query 13 output --0.2723801058145729 -1.5069204152249134 1 NULL 2.142857142857143 0.8095238095238094 0.8997354108424372 15 7 +-0.2723801058145729 -1.5069204152249134 1 3 2.142857142857143 0.8095238095238094 0.8997354108424372 15 7 ``` ```diff -- !query 15 -SELECT a AS k, COUNT(udf(b)) FROM testData GROUP BY k +SELECT a AS k, COUNT(b) FROM testData GROUP BY k -- !query 15 schema -struct<k:int,count(udf(b)):bigint> +struct<k:int,count(b):bigint> -- !query 15 output 1 2 2 2 -3 3 -NULL 2 +3 2 +NULL 1 ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
