HyukjinKwon commented on issue #25098: [SPARK-28280][SQL][PYTHON][TESTS] 
Convert and port 'group-by.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25098#issuecomment-512770020
 
 
   @skonto, do you know why we have such diff?
   
   ```diff
    -- !query 2
   -SELECT COUNT(udf(a)), udf(COUNT(b)) FROM testData
   +SELECT COUNT(a), COUNT(b) FROM testData
    -- !query 2 schema
   -struct<count(udf(a)):bigint,udf(count(b)):string>
   +struct<count(a):bigint,count(b):bigint>
    -- !query 2 output
   -9   7
   +7   7
   ```
   
   ```diff
    -- !query 3
   -SELECT CAST(udf(a) as int), COUNT(udf(b)) FROM testData GROUP BY a
   +SELECT a, COUNT(b) FROM testData GROUP BY a
    -- !query 3 schema
   -struct<CAST(udf(a) AS INT):int,count(udf(b)):bigint>
   +struct<a:int,count(b):bigint>
    -- !query 3 output
    1   2
    2   2
   -3   3
   -NULL        2
   +3   2
   +NULL        1
   ```
   
   ```diff
    -- !query 5
   -SELECT COUNT(udf(a)), COUNT(udf(b)) FROM testData GROUP BY udf(a)
   +SELECT COUNT(a), COUNT(b) FROM testData GROUP BY a
    -- !query 5 schema
   -struct<count(udf(a)):bigint,count(udf(b)):bigint>
   +struct<count(a):bigint,count(b):bigint>
    -- !query 5 output
   +0   1
    2   2
    2   2
   -2   2
   -3   3
   +3   2
   ```
   
   ```diff
    -- !query 6
   -SELECT 'foo', COUNT(udf(a)) FROM testData GROUP BY 1
   +SELECT 'foo', COUNT(a) FROM testData GROUP BY 1
    -- !query 6 schema
   -struct<foo:string,count(udf(a)):bigint>
   +struct<foo:string,count(a):bigint>
    -- !query 6 output
   -foo 9
   +foo 7
   ```
   
   ```diff
    -- !query 13
   -SELECT SKEWNESS(udf(a)), udf(KURTOSIS(a)), udf(MIN(a)), CAST(MAX(udf(a)) as 
int), udf(AVG(udf(a))), udf(VARIANCE(a)), STDDEV(udf(a)), udf(SUM(a)), 
udf(COUNT(a))
   +SELECT SKEWNESS(a), KURTOSIS(a), MIN(a), MAX(a), AVG(a), VARIANCE(a), 
STDDEV(a), SUM(a), COUNT(a)
    FROM testData
    -- !query 13 schema
   -struct<skewness(CAST(udf(a) AS DOUBLE)):double,udf(kurtosis(cast(a as 
double))):string,udf(min(a)):string,CAST(max(udf(a)) AS 
INT):int,udf(avg(cast(udf(a) as double))):string,udf(var_samp(cast(a as 
double))):string,stddev_samp(CAST(udf(a) AS DOUBLE)):double,udf(sum(cast(a as 
bigint))):string,udf(count(a)):string>
   +struct<skewness(CAST(a AS DOUBLE)):double,kurtosis(CAST(a AS 
DOUBLE)):double,min(a):int,max(a):int,avg(a):double,var_samp(CAST(a AS 
DOUBLE)):double,stddev_samp(CAST(a AS 
DOUBLE)):double,sum(a):bigint,count(a):bigint>
    -- !query 13 output
   --0.2723801058145729 -1.5069204152249134     1       NULL    
2.142857142857143       0.8095238095238094      0.8997354108424372      15      
7
   +-0.2723801058145729 -1.5069204152249134     1       3       
2.142857142857143       0.8095238095238094      0.8997354108424372      15      
7
   
   ```
   
   ```diff
    -- !query 15
   -SELECT a AS k, COUNT(udf(b)) FROM testData GROUP BY k
   +SELECT a AS k, COUNT(b) FROM testData GROUP BY k
    -- !query 15 schema
   -struct<k:int,count(udf(b)):bigint>
   +struct<k:int,count(b):bigint>
    -- !query 15 output
    1   2
    2   2
   -3   3
   -NULL        2
   +3   2
   +NULL        1
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to