Philipp Moritz created ARROW-5002:
-------------------------------------

             Summary: [C++] Implement GroupBy
                 Key: ARROW-5002
                 URL: https://issues.apache.org/jira/browse/ARROW-5002
             Project: Apache Arrow
          Issue Type: Improvement
            Reporter: Philipp Moritz


Dear all,

I wonder what the best way forward is for implementing GroupBy kernels. 
Initially this was part of

https://issues.apache.org/jira/browse/ARROW-4124

but is not contained in the current implementation as far as I can tell.

It seems that the part of group by that just returns indices could be 
conveniently implemented with the HashKernel. That seems useful in any case. Is 
that indeed the best way forward/should this be done?

GroupBy + Aggregate could then either be implemented with that + the Take 
kernel + aggregation involving more memory copies than necessary though or as 
part of the aggregate kernel. Probably the latter is preferred, any thoughts on 
that?

Am I missing any other JIRAs related to this?

Best, Philipp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to