[ 
https://issues.apache.org/jira/browse/ARROW-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17114518#comment-17114518
 ] 

Wes McKinney commented on ARROW-8901:
-------------------------------------

We probably need at least int8 through int64 (so we can use take to unpack 
dictionaries). A different code path will probably be used for running "take" 
in a selection vector context (per ARROW-8903)

> [C++] Reduce number of take kernels
> -----------------------------------
>
>                 Key: ARROW-8901
>                 URL: https://issues.apache.org/jira/browse/ARROW-8901
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Wes McKinney
>            Priority: Major
>
> After ARROW-8792 we can observe that we are generating 312 take kernels
> {code}
> In [1]: import pyarrow.compute as pc                                          
>                             
> In [2]: reg = pc.function_registry()                                          
>                             
> In [3]: reg.get_function('take')                                              
>                             
> Out[3]: 
> arrow.compute.Function
> kind: vector
> num_kernels: 312
> {code}
> You can see them all here: 
> https://gist.github.com/wesm/c3085bf40fa2ee5e555204f8c65b4ad5
> It's probably going to be sufficient to only support int16, int32, and int64 
> index types for almost all types and insert implicit casts (once we implement 
> implicit-cast-insertion into the execution code) for other index types. If we 
> determine that there is some performance hot path where we need to specialize 
> for other index types, then we can always do that.
> Additionally, we should be able to collapse the date/time kernels since we're 
> just moving memory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to