GitHub user rok added a comment to the discussion: Handling numpy ndarray or tensor objects with atleast 1 dimension having variable size
> I use pyarrow.ListType instead of pyarrow.ListType[n] because I don't have to > carry around the information of n when I'm saving or loading the data. It's > not anything complicated at all but maybe one or two lines of code less. Does > this have any other bad side effects? FixedSizeListArray is more memory efficient (it doesn't require an offsets buffer like the [ListArray](https://arrow.apache.org/docs/format/Columnar.html#variable-size-list-layout)) and we use FixedSizeListArray in the [VariableShapeTensorArray specification](https://arrow.apache.org/docs/format/CanonicalExtensions.html#variable-shape-tensor) for storing shapes. So if you'll switch to `VariableShapeTensorArray` at some point you might want to use the same memory layout. Since your `shapes` will probably be relatively small compared to your `values` array probably won't be so important to optimize it though. > in creating `dims_np` can I not use `dims_np = shapes[i].values.to_numpy()` > instead of creating an `Array` and doing the same for the first element? Sorry, my example was not great. `dims_np = shapes[i].values.to_numpy()` is definitely better and should be zero-copy. GitHub link: https://github.com/apache/arrow/discussions/48099#discussioncomment-14958901 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
