[jira] [Comment Edited] (SPARK-18186) Migrate HiveUDAFFunction to TypedImperativeAggregate for partial aggregation support

Parth Gandhi (JIRA) Wed, 18 Jul 2018 14:55:07 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548452#comment-16548452
 ]


Parth Gandhi edited comment on SPARK-18186 at 7/18/18 9:54 PM:
---------------------------------------------------------------

Hi [~lian cheng], [~yhuai], there has been an issue lately with the library 
sketches-hive([https://github.com/DataSketches/sketches-hive)] that builds and 
runs a hive udaf on top of Spark SQL. In their method getNewAggregationBuffer() 
[https://github.com/DataSketches/sketches-hive/blob/master/src/main/java/com/yahoo/sketches/hive/hll/DataToSketchUDAF.java#L106,]
 they are initializing different state objects for modes Partial1 and Partial2. 
Their code used to work well with Spark 2.1 when Spark had support for mode 
"Complete". However, after it started supporting partial aggregation in Spark 
2.2 onwards, their code gives an issue when partial merge is invoked here 
[https://github.com/DataSketches/sketches-hive/blob/master/src/main/java/com/yahoo/sketches/hive/hll/SketchEvaluator.java#L56],
 as the wrong state object is being passed in the merge function. I was just 
trying to understand the PR and wondering why did Spark stop supporting 
Complete mode in Hive UDAF or is there a way to still run in Complete mode 
which I am not aware of. Thank you.


was (Author: pgandhi):
Hi, there has been an issue lately with the library 
sketches-hive([https://github.com/DataSketches/sketches-hive)] that builds and 
runs a hive udaf on top of Spark SQL. In their method getNewAggregationBuffer() 
[https://github.com/DataSketches/sketches-hive/blob/master/src/main/java/com/yahoo/sketches/hive/hll/DataToSketchUDAF.java#L106,]
 they are initializing different state objects for modes Partial1 and Partial2. 
Their code used to work well with Spark 2.1 when Spark had support for mode 
"Complete". However, after it started supporting partial aggregation in Spark 
2.2 onwards, their code gives an issue when partial merge is invoked here 
[https://github.com/DataSketches/sketches-hive/blob/master/src/main/java/com/yahoo/sketches/hive/hll/SketchEvaluator.java#L56],
 as the wrong state object is being passed in the merge function. I was just 
trying to understand the PR and wondering why did Spark stop supporting 
Complete mode in Hive UDAF or is there a way to still run in Complete mode 
which I am not aware of. Thank you.

> Migrate HiveUDAFFunction to TypedImperativeAggregate for partial aggregation 
> support
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-18186
>                 URL: https://issues.apache.org/jira/browse/SPARK-18186
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.6.2, 2.0.1
>            Reporter: Cheng Lian
>            Assignee: Cheng Lian
>            Priority: Major
>             Fix For: 2.2.0
>
>
> Currently, Hive UDAFs in Spark SQL don't support partial aggregation. Any 
> query involving any Hive UDAFs has to fall back to {{SortAggregateExec}} 
> without partial aggregation.
> This issue can be fixed by migrating {{HiveUDAFFunction}} to 
> {{TypedImperativeAggregate}}, which already provides partial aggregation 
> support for aggregate functions that may use arbitrary Java objects as 
> aggregation states.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18186) Migrate HiveUDAFFunction to TypedImperativeAggregate for partial aggregation support

Reply via email to