GitHub user chenghao-intel opened a pull request:

    https://github.com/apache/spark/pull/5542

    [SPARK-4233] [SQL] [WIP] UDAF Interface Refactoring

    This PR will keep both old / new versions of UDAF, and switch them by
    ```
    SET spark.sql.aggregate2=true/false;
    ```
    The new interface is
    ```scala
    trait AggregateFunction2 {
      self: Product =>
    
      // Specify the BoundReference for Aggregate Buffer
      def initialize(buffers: Seq[BoundReference]): Unit
    
      // Initialize (reinitialize) the aggregation buffer
      def reset(buf: MutableRow): Unit
    
      // Get the children value from the input row, and then
      // merge it with the given aggregate buffer,
      // `seen` is the set that the value showed up, that's will
      // be useful for distinct aggregate. And it probably be
      // null for non-distinct aggregate
      def update(input: Row, buf: MutableRow, seen: JSet[Any]): Unit
    
      // Merge 2 aggregation buffers, and write back to the later one
      def merge(value: Row, buf: MutableRow): Unit
    
      // Semantically we probably don't need this, however, we need it when
      // integrating with Hive UDAF(GenericUDAF)
      @deprecated
      def terminatePartial(buf: MutableRow): Unit = {}
    
      // Output the final result by feeding the aggregation buffer
      def terminate(buffer: Row): Any
    }
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chenghao-intel/spark udaf_refactor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5542.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5542
    
----
commit a6fdd419a63c241f2d21ca3bebc0222cf96e9280
Author: Cheng Hao <[email protected]>
Date:   2015-04-11T06:08:23Z

    migrate to support both version of UDAF

commit bfe30158023c6f876fd9112d456e749b35afd40c
Author: Cheng Hao <[email protected]>
Date:   2015-04-11T06:33:12Z

    Update the unit test to comment out the not support ones

commit 3a8232ccdf5abcfd6962be8dc6f3e50dbb8a6f88
Author: Cheng Hao <[email protected]>
Date:   2015-04-13T23:05:37Z

    update the interface name

commit dced96cde1f650f1fd098cc77504d6858febc104
Author: Cheng Hao <[email protected]>
Date:   2015-04-15T02:43:40Z

    change the update method from Any to Row

commit 29202bd89e028ce534b1b986c3444c362c29a3d8
Author: Cheng Hao <[email protected]>
Date:   2015-04-15T18:29:51Z

    move the distinct into the udaf

commit 967716b74d7cdbd3ea382e5b657a3952b3ef585f
Author: Cheng Hao <[email protected]>
Date:   2015-04-16T07:50:31Z

    simpify the aggregate expression by uing the Projection

commit 504fbe52abc191e873cc741fe9ef44d9e4013d7a
Author: Cheng Hao <[email protected]>
Date:   2015-04-16T08:23:43Z

    revert the uncessary changes

commit e9017ed23e946a3cc0d7e5142c05a19039491c58
Author: Cheng Hao <[email protected]>
Date:   2015-04-16T17:45:12Z

    Add Unit test

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to