[jira] [Created] (SPARK-13680) Java UDAF with more than one intermediate argument returns wrong results

Yael Aharon (JIRA) Fri, 04 Mar 2016 08:23:16 -0800

Yael Aharon created SPARK-13680:
-----------------------------------

             Summary: Java UDAF with more than one intermediate argument 
returns wrong results
                 Key: SPARK-13680
                 URL: https://issues.apache.org/jira/browse/SPARK-13680
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.5.0
         Environment: CDH 5.5.2
            Reporter: Yael Aharon



I am trying to incorporate the Java UDAF from 
https://github.com/apache/spark/blob/master/sql/hive/src/test/java/org/apache/spark/sql/hive/aggregate/MyDoubleAvg.java
 into an SQL query. 
I registered the UDAF like this:
 sqlContext.udf().register("myavg", new MyDoubleAvg());

My SQL query is:
SELECT AVG(seqi) AS `avg_seqi`, AVG(seqd) AS `avg_seqd`, AVG(ci) AS `avg_ci`, 
AVG(cd) AS `avg_cd`, AVG(stdevd) AS `avg_stdevd`, AVG(stdevi) AS `avg_stdevi`, 
MAX(seqi) AS `max_seqi`, MAX(seqd) AS `max_seqd`, MAX(ci) AS `max_ci`, MAX(cd) 
AS `max_cd`, MAX(stdevd) AS `max_stdevd`, MAX(stdevi) AS `max_stdevi`, 
MIN(seqi) AS `min_seqi`, MIN(seqd) AS `min_seqd`, MIN(ci) AS `min_ci`, MIN(cd) 
AS `min_cd`, MIN(stdevd) AS `min_stdevd`, MIN(stdevi) AS `min_stdevi`,SUM(seqi) 
AS `sum_seqi`, SUM(seqd) AS `sum_seqd`, SUM(ci) AS `sum_ci`, SUM(cd) AS 
`sum_cd`, SUM(stdevd) AS `sum_stdevd`, SUM(stdevi) AS `sum_stdevi`, myavg(seqd) 
as `myavg_seqd`,          AVG(zero) AS `avg_zero`, AVG(nulli) AS 
`avg_nulli`,AVG(nulld) AS `avg_nulld`, SUM(zero) AS `sum_zero`, SUM(nulli) AS 
`sum_nulli`,SUM(nulld) AS `sum_nulld`,MAX(zero) AS `max_zero`, MAX(nulli) AS 
`max_nulli`,MAX(nulld) AS `max_nulld`,count(*) AS `count_all`, count(nulli) AS 
`count_nulli` FROM mytable

As soon as I add the UDAF myavg to the SQL, all the results become incorrect. 
When I remove the call to the UDAF, the results are correct.
I was able to go around the issue by modifying bufferSchema of the UDAF to use 
an array and the corresponding update and merge methods. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13680) Java UDAF with more than one intermediate argument returns wrong results

Reply via email to