[ https://issues.apache.org/jira/browse/ARROW-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17114518#comment-17114518 ]
Wes McKinney commented on ARROW-8901: ------------------------------------- We probably need at least int8 through int64 (so we can use take to unpack dictionaries). A different code path will probably be used for running "take" in a selection vector context (per ARROW-8903) > [C++] Reduce number of take kernels > ----------------------------------- > > Key: ARROW-8901 > URL: https://issues.apache.org/jira/browse/ARROW-8901 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Wes McKinney > Priority: Major > > After ARROW-8792 we can observe that we are generating 312 take kernels > {code} > In [1]: import pyarrow.compute as pc > > In [2]: reg = pc.function_registry() > > In [3]: reg.get_function('take') > > Out[3]: > arrow.compute.Function > kind: vector > num_kernels: 312 > {code} > You can see them all here: > https://gist.github.com/wesm/c3085bf40fa2ee5e555204f8c65b4ad5 > It's probably going to be sufficient to only support int16, int32, and int64 > index types for almost all types and insert implicit casts (once we implement > implicit-cast-insertion into the execution code) for other index types. If we > determine that there is some performance hot path where we need to specialize > for other index types, then we can always do that. > Additionally, we should be able to collapse the date/time kernels since we're > just moving memory. -- This message was sent by Atlassian Jira (v8.3.4#803005)