Re: Arrow with indirection?

Weston Pace Thu, 14 Sep 2023 05:42:39 -0700

I'm not entirely sure what kinds of operations you need.

Arrow arrays all (with the exception of RLE) support constant time (O(1))
random access.  So generally if you want to keep a pointer to a particular
element or a row of data then that is ok.

On the other hand, you mentioned sorting.  One thing that is a little
challenging in arrow is swapping two rows of data.  It's very possible, and
still the same algorithmic complexity (O(# columns)) as a row based format
but it is not as memory efficient. Because you are doing a separate memory
swap for each array.

This is why arrow compute libraries will sometimes convert to a row based
format for certain operations.

On Thu, Sep 14, 2023, 8:21 AM Andrew Bell <[email protected]> wrote:

> Hi,
>
> We have a data structure that stores points in a point cloud (X, Y, Z,
> attributes) and we have been approached about replacing the current memory
> store with Arrow. The issue is that the current data store also has a set
> of pointers (indirection) that allows for things like subsetting and
> sorting while keeping the data in place. All data is accessed through the
> indirection table. What people typically want is to export one or more of
> these data sets specified by the pointers.
>
> My understanding is that Arrow doesn't support such a scheme as the point
> of the structure is to allow SIMD and other optimizations gained by
> processing contiguous data. Am I missing something in my reading of the
> Arrow docs? Does anyone have thoughts/recommendations, or is Arrow just not
> a good fit for this kind of thing?
>
> Thanks,
>
> --
> Andrew Bell
> [email protected]
>

Re: Arrow with indirection?

Reply via email to