[D] Handling numpy ndarray or tensor objects with atleast 1 dimension having variable size [arrow]

via GitHub Tue, 11 Nov 2025 09:23:47 -0800


GitHub user RayZ0rr created a discussion: Handling numpy ndarray or tensor 
objects with atleast 1 dimension having variable size


I want to use objects which like numpy ndarray or pytorch tensors which can 
have atleast 1 dimension where the size varies. For example consider list of 2D 
pointclouds. Each pointcloud data or example has shape (N, 2). Here `N` can be 
different for different pointcloud data.

[`pyarrow.FixedShapeTensorType`](https://arrow.apache.org/docs/python/generated/pyarrow.FixedShapeTensorType.html)
 doesn't work for this usecase. `VariableShapeTensor` implementations 
[1](https://github.com/apache/arrow/pull/40354) and 
[2](https://github.com/apache/arrow/issues/38007) has not been merged. While 
waiting for these merges I have implemented this in the following way for 
zero-copy retrieval of the original list of variable tensors from the pyarrow 
table.

- For each variable shape tensor keep two columns one of type 
`pyarrow.ListType` with the child type same as `dtype` of the tensor and other 
column of type `pyarrow.ListType` with child as int32.
- Take for example 1st column as `"points_val"` and other `"points_shape"`. 
Each element of `"points_val"` will be a flattened list of values of a single 
tensor (`view(-1)` or `reshape(-1)`). Each element of `"points_shape"` will 
have the shape of the tensor. 
- Using the following function we can get a list of original variable shape 
tensors back. There is a more efficient way to do this if the full tensor fits 
in memory.
```
def getTensors(table: pa.Table):
    vals = table["points_val"]
    shapes = table["points_shape"]
    out = []
    M = len(vals)
    for i in range(M):
        data_np  = vals[i].values.to_numpy()
        dims_np  = shapes[i].values
        o = data_np.reshape(tuple(int(x) for x in dims_np))
        out.append(o)
    return out
```


Does anyone know of a better way or think this is not zero-copy?

GitHub link: https://github.com/apache/arrow/discussions/48099

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

[D] Handling numpy ndarray or tensor objects with atleast 1 dimension having variable size [arrow]

Reply via email to