[ 
https://issues.apache.org/jira/browse/ARROW-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques reassigned ARROW-4124:
---------------------------------------------

    Assignee: Francois Saint-Jacques

> [C++] Abstract aggregation kernel API
> -------------------------------------
>
>                 Key: ARROW-4124
>                 URL: https://issues.apache.org/jira/browse/ARROW-4124
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Wes McKinney
>            Assignee: Francois Saint-Jacques
>            Priority: Major
>             Fix For: 0.13.0
>
>
> Related to the particular details of implementing various aggregation types, 
> we should first put a bit of energy into the abstract API for aggregating 
> data in a multi-threaded setting
> Aggregators must support both hash/group (e.g. "group by" in SQL or data 
> frame libraries) modes and non-group modes. 
> Aggregations ideally should also support filter pushdown. For example:
> {code}
> select $AGG($EXPR)
> from $TABLE
> where $PREDICATE
> {code}
> Some systems might materialize the post-predicate / filtered version of 
> {{$EXPR}}, then aggregate that. pandas does this for example. Vectorized 
> performance can be much improved by filtering inside the aggregation kernel. 
> How the predicate true/false values are handled may depend on the 
> implementation details of the kernel (e.g. SUM or MEAN will be a bit 
> different from PRODUCT)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to