GitHub user yhuai opened a pull request:

    https://github.com/apache/spark/pull/7458

    [SPARK-4366] [SQL] [WIP] Aggregation Improvement

    https://issues.apache.org/jira/browse/SPARK-4366
    
    This work will have a new code path for evaluating aggregate functions. To 
enable this path, you can set `spark.sql.useAggregate2` to true.
    
    This WIP PR contains:
    * A new aggregate function interface (`AggregateFunction2`) and two example 
aggregate functions (`Average` and `MyDoubleSum`).
    * A sort-based aggregate operator for the new aggregate function interface 
(`Aggregate2Sort`).
    
    There are two remaining tasks for this prototype.
    - [ ] `DISTINCT` aggregation support.
    - [ ] UDAF interface (based on `AggregateFunction2`).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yhuai/spark UDAF

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/7458.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #7458
    
----
commit dded1c5a740014cf332586488701b76e2cbbbdbc
Author: Yin Huai <[email protected]>
Date:   2015-07-10T02:10:24Z

    wip

commit f7996d0fa9c4c9f3b40f6630130cb92c38d1eefd
Author: Michael Armbrust <[email protected]>
Date:   2015-07-10T21:47:45Z

    Add AlgebraicAggregate

commit 6bbc6ba7ff33998ee0cf62d6f89a9232f7f38d1b
Author: Michael Armbrust <[email protected]>
Date:   2015-07-10T22:48:23Z

    now with correct answers\!

commit 5c00f3fa64b600a619adcf5fc2ec09286939b03a
Author: Michael Armbrust <[email protected]>
Date:   2015-07-11T00:33:54Z

    First draft of codegen

commit b7720ba2b33a96e0c3ee3aa19e1c4cf39b5fec0d
Author: Yin Huai <[email protected]>
Date:   2015-07-13T21:37:44Z

    Add an analysis rule to convert aggregate function to the new version.

commit 39ee975e0ed1fc5fa27a6a1feb25b2c6460ada84
Author: Yin Huai <[email protected]>
Date:   2015-07-13T22:55:51Z

    Code cleanup: Remove unnecesary AttributeReferences.

commit f7d9e541143502d325cdd9c9b674f49b26b3bcc8
Author: Michael Armbrust <[email protected]>
Date:   2015-07-13T23:27:44Z

    Merge remote-tracking branch 'apache/master' into UDAF

commit 072209fdc4777a078f5c85c8f2e0296210118ec4
Author: Yin Huai <[email protected]>
Date:   2015-07-14T04:25:45Z

    Bug fix: Handle expressions in grouping columns that are not attribute 
references.

commit 1b0bb3f5a4602658ca4192593d33d38733a4f34f
Author: Yin Huai <[email protected]>
Date:   2015-07-14T04:26:26Z

    Do not bind references in AlgebraicAggregate and use code gen for all 
places.

commit 8cfa6a9f269b1f94662f8931a0c4efd6543642a5
Author: Michael Armbrust <[email protected]>
Date:   2015-07-14T06:18:09Z

    add test

commit 1b490edea32c124865a425175e6fe9b5941fe049
Author: Michael Armbrust <[email protected]>
Date:   2015-07-14T06:30:27Z

    make hive test

commit 4435f20e3f5b95cc3023e73b86c10f1d5bc878aa
Author: Yin Huai <[email protected]>
Date:   2015-07-14T19:24:46Z

    Add ConvertAggregateFunction to HiveContext's analyzer.

commit 2857b55bb369f42a137b79fb83a50309e3cd0834
Author: Yin Huai <[email protected]>
Date:   2015-07-14T19:24:53Z

    Merge remote-tracking branch 'upstream/master' into UDAF

commit aff9534fc1279b2fbd59d981f721538ccba2b659
Author: Yin Huai <[email protected]>
Date:   2015-07-15T02:15:48Z

    Make Aggregate2Sort work with both algebraic AggregateFunctions and 
non-algebraic AggregateFunctions.

commit 5b46d415930bad119f4fe3b58303e8d7bd4595f3
Author: Yin Huai <[email protected]>
Date:   2015-07-16T06:56:51Z

    Bug fix.

commit 32aea9c574c09ed1fad9e836a236c0d5b0eae98a
Author: Yin Huai <[email protected]>
Date:   2015-07-16T17:40:11Z

    Merge remote-tracking branch 'upstream/master' into UDAF
    
    Conflicts:
        
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala

commit d821a347298ee6d2f78330bab60d89540b7f5ceb
Author: Yin Huai <[email protected]>
Date:   2015-07-16T22:13:48Z

    Cleanup.

commit 4721936ba952f8eb0aff1de533ba328c0e018773
Author: Yin Huai <[email protected]>
Date:   2015-07-16T22:19:21Z

    Add CheckAggregateFunction to extendedCheckRules.

commit 70b169c981ece62cbf755169043cbe1239da9afa
Author: Yin Huai <[email protected]>
Date:   2015-07-16T23:57:02Z

    Remove groupOrdering.

commit 6edb5ace6a6939c8a46842d981a6b57ceb784db7
Author: Yin Huai <[email protected]>
Date:   2015-07-17T04:04:27Z

    Format update.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to