Dominic Sisneros FAA, WSA Engineering Services, AJW-2W13B Office: 801-320-2377 Cell: 801-558-1966
-----Original Message----- From: Wes McKinney <[email protected]> Sent: Friday, January 15, 2021 8:38 AM To: [email protected] Subject: Re: compute::Take & ChunkedArrays You can do that, but note that the implementation is currently not efficient, see https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/vector_selection.cc#L1909 Rather than pre-concatenating the chunks (which can easily fail) and then invoking Take on the resulting concatenated Array, it would be better to do a O(N log K) take on the chunks directly, where N is the number of take indices and K is the number of chunks. For example, if you have chunks of size 10 50 100 20 then the algorithm computes the following offset table: 0 10 60 160 180 Indices relative to the whole ChunkedArray are translated to (chunk number, intrachunk index), for example: take with [5, 40, 100, 170] is translated by doing binary searches in the offset table to: (chunk=0, relative_index=5) (1, 30) (2, 40) (3, 10) Consecutive indices from the same chunk are batched together and then Take is invoked on the respective chunk (with boundschecking disabled) to select a chunk for the resulting output ChunkedArray. Might be helpful to copy this to the appropriate Jira (I'm sure there is one already) to assist the person who implements this. Thanks, Wes On Mon, Jan 11, 2021 at 10:01 AM Niranda Perera <[email protected]> wrote: > > Hi all, > > I was wondering how the Take API works with ChunkedArrays? > ex: If we have a ChunkedArray[100] with Array1[50] and Array2[50] so, > if I want an element from each array, can I pass something like [10, 60] as > the indices? > > -- > Niranda Perera > @n1r44 > +1 812 558 8884 / +94 71 554 8430 > https://www.linkedin.com/in/niranda
