Andrew Lamb created ARROW-10141:
-----------------------------------

             Summary: [Rust][Arrow] Improve performance of filter kernel
                 Key: ARROW-10141
                 URL: https://issues.apache.org/jira/browse/ARROW-10141
             Project: Apache Arrow
          Issue Type: Improvement
            Reporter: Andrew Lamb


As [~jorgecarleitao] noted here: 
https://github.com/apache/arrow/pull/8303#issuecomment-701328143

The improvement of the filter kernel (and likely others) could be improved by 
avoiding creating intermediate copies. The code currently:

# creates Vec<Option<T>> through an iteration
# copies Vec<Option<T>> to the two buffers (when from_opt_vec is called)

it may be more efficient to create the buffers during the iteration, so that we 
avoid the copy (Vec -> buffers). In other words, the code in from_opt_vec could 
have been "injected" into the filter execution, where the MutableBuffer and 
offsets and values buffer are created before the loop, and new elements are 
directly written to it. 

(as a side note, this is why he proposed ARROW-10030 
https://github.com/apache/arrow/pull/8211  : IMO there is some boiler-plate 
copy-pasting to

* initialize buffers
* iterate
* create ArrayData from buffers

which will continue to grow as we add more kernels, and whose pattern seems to 
be a FromIter of fixed size



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to