Re: [D] Handling numpy ndarray or tensor objects with atleast 1 dimension having variable size [arrow]

via GitHub Wed, 12 Nov 2025 06:54:35 -0800


GitHub user rok added a comment to the discussion: Handling numpy ndarray or 
tensor objects with atleast 1 dimension having variable size


Hey @RayZ0rr ! Nice to see there's interest in using arrow for this.

> * For each variable shape tensor keep two columns one of type 
> pyarrow.ListType with the child type same as dtype of the tensor and other 
> column of type pyarrow.ListType with child as int32.

In `VariableShapeTensor` we 
[specify](https://arrow.apache.org/docs/format/CanonicalExtensions.html#variable-shape-tensor)
 shapes are stored in `pyarrow.ListType[n]` where it would be n=2 for your 
case. From  your snippet I can't tell if you do `pyarrow.ListType[n]` or 
`pyarrow.ListType`.

> * Using the following function we can get a list of original variable shape 
> tensors back. There is a more efficient way to do this if the full tensor 
> fits in memory.

In [VariableShapeTensor Python 
PR](https://github.com/apache/arrow/blob/55cc40b92e016f06a515391e01390886fd94b514/python/pyarrow/array.pxi#L4757C57-L4761)
 we propose `from_numpy_ndarray` which does the reverse of what you want to do. 
For your case I would check to make sure no copies occur in the `reshape`, I 
would also create and use `dims_np` like so to avoid copying and pure python:

```python
dims_np = pa.array(shapes[i], pa.list_(pa.int32(), 2))[0].values.to_numpy()
o = data_np.reshape(dims_np)
```

GitHub link: 
https://github.com/apache/arrow/discussions/48099#discussioncomment-14948683

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] Handling numpy ndarray or tensor objects with atleast 1 dimension having variable size [arrow]

Reply via email to