I'm not entirely sure what kinds of operations you need. Arrow arrays all (with the exception of RLE) support constant time (O(1)) random access. So generally if you want to keep a pointer to a particular element or a row of data then that is ok.
On the other hand, you mentioned sorting. One thing that is a little challenging in arrow is swapping two rows of data. It's very possible, and still the same algorithmic complexity (O(# columns)) as a row based format but it is not as memory efficient. Because you are doing a separate memory swap for each array. This is why arrow compute libraries will sometimes convert to a row based format for certain operations. On Thu, Sep 14, 2023, 8:21 AM Andrew Bell <[email protected]> wrote: > Hi, > > We have a data structure that stores points in a point cloud (X, Y, Z, > attributes) and we have been approached about replacing the current memory > store with Arrow. The issue is that the current data store also has a set > of pointers (indirection) that allows for things like subsetting and > sorting while keeping the data in place. All data is accessed through the > indirection table. What people typically want is to export one or more of > these data sets specified by the pointers. > > My understanding is that Arrow doesn't support such a scheme as the point > of the structure is to allow SIMD and other optimizations gained by > processing contiguous data. Am I missing something in my reading of the > Arrow docs? Does anyone have thoughts/recommendations, or is Arrow just not > a good fit for this kind of thing? > > Thanks, > > -- > Andrew Bell > [email protected] >
