GitHub user rok added a comment to the discussion: Handling numpy ndarray or 
tensor objects with atleast 1 dimension having variable size

> I use pyarrow.ListType instead of pyarrow.ListType[n] because I don't have to 
> carry around the information of n when I'm saving or loading the data. It's 
> not anything complicated at all but maybe one or two lines of code less. Does 
> this have any other bad side effects?

FixedSizeListArray is more memory efficient (it doesn't require an offsets 
buffer like the 
[ListArray](https://arrow.apache.org/docs/format/Columnar.html#variable-size-list-layout))
 and we use FixedSizeListArray in the [VariableShapeTensorArray 
specification](https://arrow.apache.org/docs/format/CanonicalExtensions.html#variable-shape-tensor)
 for storing shapes. So if you'll switch to `VariableShapeTensorArray` at some 
point you might want to use the same memory layout.
Since your `shapes` will probably be relatively small compared to your `values` 
array probably won't be so important to optimize it though.

> in creating `dims_np` can I not use `dims_np = shapes[i].values.to_numpy()` 
> instead of creating an `Array` and doing the same for the first element?

Sorry, my example was not great. `dims_np = shapes[i].values.to_numpy()` is 
definitely better and should be zero-copy.

GitHub link: 
https://github.com/apache/arrow/discussions/48099#discussioncomment-14958901

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to