Dominic Sisneros
FAA, WSA Engineering Services, AJW-2W13B
Office: 801-320-2377
Cell: 801-558-1966

-----Original Message-----
From: Wes McKinney <[email protected]> 
Sent: Friday, January 15, 2021 8:38 AM
To: [email protected]
Subject: Re: compute::Take & ChunkedArrays

You can do that, but note that the implementation is currently not efficient, 
see

https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/vector_selection.cc#L1909

Rather than pre-concatenating the chunks (which can easily fail) and then 
invoking Take on the resulting concatenated Array, it would be better to do a 
O(N log K) take on the chunks directly, where N is the number of take indices 
and K is the number of chunks.

For example, if you have chunks of size

10
50
100
20

then the algorithm computes the following offset table:

0
10
60
160
180

Indices relative to the whole ChunkedArray are translated to (chunk number, 
intrachunk index), for example:

take with [5, 40, 100, 170] is translated by doing binary searches in the 
offset table to:

(chunk=0, relative_index=5)
(1, 30)
(2, 40)
(3, 10)

Consecutive indices from the same chunk are batched together and then Take is 
invoked on the respective chunk (with boundschecking disabled) to select a 
chunk for the resulting output ChunkedArray.

Might be helpful to copy this to the appropriate Jira (I'm sure there is one 
already) to assist the person who implements this.

Thanks,
Wes

On Mon, Jan 11, 2021 at 10:01 AM Niranda Perera <[email protected]> 
wrote:
>
> Hi all,
>
> I was wondering how the Take API works with ChunkedArrays?
> ex: If we have a ChunkedArray[100] with Array1[50] and Array2[50] so, 
> if I want an element from each array, can I pass something like [10, 60] as 
> the indices?
>
> --
> Niranda Perera
> @n1r44
> +1 812 558 8884 / +94 71 554 8430
> https://www.linkedin.com/in/niranda

Reply via email to