[ https://issues.apache.org/jira/browse/ARROW-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Francois Saint-Jacques reassigned ARROW-4124: --------------------------------------------- Assignee: Francois Saint-Jacques > [C++] Abstract aggregation kernel API > ------------------------------------- > > Key: ARROW-4124 > URL: https://issues.apache.org/jira/browse/ARROW-4124 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Wes McKinney > Assignee: Francois Saint-Jacques > Priority: Major > Fix For: 0.13.0 > > > Related to the particular details of implementing various aggregation types, > we should first put a bit of energy into the abstract API for aggregating > data in a multi-threaded setting > Aggregators must support both hash/group (e.g. "group by" in SQL or data > frame libraries) modes and non-group modes. > Aggregations ideally should also support filter pushdown. For example: > {code} > select $AGG($EXPR) > from $TABLE > where $PREDICATE > {code} > Some systems might materialize the post-predicate / filtered version of > {{$EXPR}}, then aggregate that. pandas does this for example. Vectorized > performance can be much improved by filtering inside the aggregation kernel. > How the predicate true/false values are handled may depend on the > implementation details of the kernel (e.g. SUM or MEAN will be a bit > different from PRODUCT) -- This message was sent by Atlassian JIRA (v7.6.3#76005)