Hi Wes, Thanks. On the top of my head, that was a similar algorithm I had in mind as well. Is this the JIRA you were referring to? [1] I see that there are some improvements that have been done here [2].
I guess bug reports like this [3] are also related to the same scenario. Is there anyone working on this? Best [1] https://issues.apache.org/jira/browse/ARROW-5454 [2] https://github.com/apache/arrow/pull/8823 [3] https://issues.apache.org/jira/browse/ARROW-10799 On Fri, Jan 15, 2021 at 10:38 AM Wes McKinney <[email protected]> wrote: > You can do that, but note that the implementation is currently not > efficient, see > > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/vector_selection.cc#L1909 > > Rather than pre-concatenating the chunks (which can easily fail) and > then invoking Take on the resulting concatenated Array, it would be > better to do a O(N log K) take on the chunks directly, where N is the > number of take indices and K is the number of chunks. > > For example, if you have chunks of size > > 10 > 50 > 100 > 20 > > then the algorithm computes the following offset table: > > 0 > 10 > 60 > 160 > 180 > > Indices relative to the whole ChunkedArray are translated to (chunk > number, intrachunk index), for example: > > take with [5, 40, 100, 170] is translated by doing binary searches in > the offset table to: > > (chunk=0, relative_index=5) > (1, 30) > (2, 40) > (3, 10) > > Consecutive indices from the same chunk are batched together and then > Take is invoked on the respective chunk (with boundschecking disabled) > to select a chunk for the resulting output ChunkedArray. > > Might be helpful to copy this to the appropriate Jira (I'm sure there > is one already) to assist the person who implements this. > > Thanks, > Wes > > On Mon, Jan 11, 2021 at 10:01 AM Niranda Perera > <[email protected]> wrote: > > > > Hi all, > > > > I was wondering how the Take API works with ChunkedArrays? > > ex: If we have a ChunkedArray[100] with Array1[50] and Array2[50] > > so, if I want an element from each array, can I pass something like [10, > 60] as the indices? > > > > -- > > Niranda Perera > > @n1r44 > > +1 812 558 8884 / +94 71 554 8430 > > https://www.linkedin.com/in/niranda > -- Niranda Perera @n1r44 <https://twitter.com/N1R44> +1 812 558 8884 / +94 71 554 8430 https://www.linkedin.com/in/niranda
