I'm not entirely sure what kinds of operations you need.

Arrow arrays all (with the exception of RLE) support constant time (O(1))
random access.  So generally if you want to keep a pointer to a particular
element or a row of data then that is ok.

On the other hand, you mentioned sorting.  One thing that is a little
challenging in arrow is swapping two rows of data.  It's very possible, and
still the same algorithmic complexity (O(# columns)) as a row based format
but it is not as memory efficient. Because you are doing a separate memory
swap for each array.

This is why arrow compute libraries will sometimes convert to a row based
format for certain operations.

On Thu, Sep 14, 2023, 8:21 AM Andrew Bell <[email protected]> wrote:

> Hi,
>
> We have a data structure that stores points in a point cloud (X, Y, Z,
> attributes) and we have been approached about replacing the current memory
> store with Arrow. The issue is that the current data store also has a set
> of pointers (indirection) that allows for things like subsetting and
> sorting while keeping the data in place. All data is accessed through the
> indirection table. What people typically want is to export one or more of
> these data sets specified by the pointers.
>
> My understanding is that Arrow doesn't support such a scheme as the point
> of the structure is to allow SIMD and other optimizations gained by
> processing contiguous data. Am I missing something in my reading of the
> Arrow docs? Does anyone have thoughts/recommendations, or is Arrow just not
> a good fit for this kind of thing?
>
> Thanks,
>
> --
> Andrew Bell
> [email protected]
>

Reply via email to