GitHub user rok added a comment to the discussion: Handling numpy ndarray or tensor objects with atleast 1 dimension having variable size
Hey @RayZ0rr ! Nice to see there's interest in using arrow for this. > * For each variable shape tensor keep two columns one of type > pyarrow.ListType with the child type same as dtype of the tensor and other > column of type pyarrow.ListType with child as int32. In `VariableShapeTensor` we [specify](https://arrow.apache.org/docs/format/CanonicalExtensions.html#variable-shape-tensor) shapes are stored in `pyarrow.ListType[n]` where it would be n=2 for your case. From your snippet I can't tell if you do `pyarrow.ListType[n]` or `pyarrow.ListType`. > * Using the following function we can get a list of original variable shape > tensors back. There is a more efficient way to do this if the full tensor > fits in memory. In [VariableShapeTensor Python PR](https://github.com/apache/arrow/blob/55cc40b92e016f06a515391e01390886fd94b514/python/pyarrow/array.pxi#L4757C57-L4761) we propose `from_numpy_ndarray` which does the reverse of what you want to do. For your case I would check to make sure no copies occur in the `reshape`, I would also create and use `dims_np` like so to avoid copying and pure python: ```python dims_np = pa.array(shapes[i], pa.list_(pa.int32(), 2))[0].values.to_numpy() o = data_np.reshape(dims_np) ``` GitHub link: https://github.com/apache/arrow/discussions/48099#discussioncomment-14948683 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
