On Sun, Jan 17, 2021 at 8:59 AM Niranda Perera <[email protected]> wrote: > > Hi Wes, > > Thanks. On the top of my head, that was a similar algorithm I had in mind as > well. > Is this the JIRA you were referring to? [1] > I see that there are some improvements that have been done here [2]. > > I guess bug reports like this [3] are also related to the same scenario. > > Is there anyone working on this?
If open Jira issues are not assigned to anyone you can assume that no one is working on them. > > Best > > [1] https://issues.apache.org/jira/browse/ARROW-5454 > [2] https://github.com/apache/arrow/pull/8823 > [3] https://issues.apache.org/jira/browse/ARROW-10799 > > On Fri, Jan 15, 2021 at 10:38 AM Wes McKinney <[email protected]> wrote: >> >> You can do that, but note that the implementation is currently not >> efficient, see >> >> https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/vector_selection.cc#L1909 >> >> Rather than pre-concatenating the chunks (which can easily fail) and >> then invoking Take on the resulting concatenated Array, it would be >> better to do a O(N log K) take on the chunks directly, where N is the >> number of take indices and K is the number of chunks. >> >> For example, if you have chunks of size >> >> 10 >> 50 >> 100 >> 20 >> >> then the algorithm computes the following offset table: >> >> 0 >> 10 >> 60 >> 160 >> 180 >> >> Indices relative to the whole ChunkedArray are translated to (chunk >> number, intrachunk index), for example: >> >> take with [5, 40, 100, 170] is translated by doing binary searches in >> the offset table to: >> >> (chunk=0, relative_index=5) >> (1, 30) >> (2, 40) >> (3, 10) >> >> Consecutive indices from the same chunk are batched together and then >> Take is invoked on the respective chunk (with boundschecking disabled) >> to select a chunk for the resulting output ChunkedArray. >> >> Might be helpful to copy this to the appropriate Jira (I'm sure there >> is one already) to assist the person who implements this. >> >> Thanks, >> Wes >> >> On Mon, Jan 11, 2021 at 10:01 AM Niranda Perera >> <[email protected]> wrote: >> > >> > Hi all, >> > >> > I was wondering how the Take API works with ChunkedArrays? >> > ex: If we have a ChunkedArray[100] with Array1[50] and Array2[50] >> > so, if I want an element from each array, can I pass something like [10, >> > 60] as the indices? >> > >> > -- >> > Niranda Perera >> > @n1r44 >> > +1 812 558 8884 / +94 71 554 8430 >> > https://www.linkedin.com/in/niranda > > > > -- > Niranda Perera > @n1r44 > +1 812 558 8884 / +94 71 554 8430 > https://www.linkedin.com/in/niranda
