If I'm not mistaken, what you want is basically an extension type [1] for 
tensors, so you can have a column where each row contains a tensor/matrix. This 
has been discussed for quite some time [2].

Incidentally, you can keep the three-field representation but pack it into a 
single toplevel field with the Struct type. 

[1]: https://arrow.apache.org/docs/python/extending_types.html
[2]: https://issues.apache.org/jira/browse/ARROW-1614

On Wed, Jul 6, 2022, at 19:01, dl via user wrote:
> I have tabular data with one record field of type scipy.sparse.csr_matrix. I 
> want to convert this tabular data to a pyarrow table. I had been first 
> converting the csr_matrix first to a custom representation using three fields 
> (shape, keys, indices) and building the pyarrow table using a schema with the 
> types of these fields and table data with a separate list for each field (and 
> each list having one entry per input record). I was hoping I could use a 
> single pyarrow.SparseCSRMatrix field  instead of the custom three field 
> representation. Is that possible? Incidentally, the shape of the csr_matrix 
> is typically (1,N) where N may vary for different records. But I don't think 
> "typically (1,N)" matters. It would work with variable shape (M,N). The shape 
> field has type pyarrow.List with value_type = pyarrow.int32().
> 
> 
> On 7/6/2022 2:53 PM, Rok Mihevc wrote:
>> Hey David, 
>> 
>> I don't think Table is designed in a way that you could "populate" it with a 
>> 2D tensor. It should rather be populated with a collection of equal length 
>> arrays.
>> Sparse CSR tensor on the other hand is composed of three arrays (indices, 
>> indptr, values) and you need a bit more involved logic to manipulate those 
>> than regular arrays. See [1] for memory layout definition.
>> 
>> What are you looking to accomplish? What access patterns are you expecting?
>> 
>> Rok
>> 
>> [1] https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs
>> 
>> On Wed, Jul 6, 2022 at 10:48 PM dl <dydx...@yahoo.com> wrote:
>>> Hi Rok,
>>> 
>>> What data type would I use for a pyarrow SparseCSRMatrix in a schema? I 
>>> need to build a table with rows which include a field of this type. I don't 
>>> see a related example in the test module. I'm doing something like:
>>> 
>>> schema = pyarrow.schema(fields, metadata=metadata)
>>> table = pyarrow.Table.from_arrays(table_data, schema=schema)
>>> 
>>> where fields is a list of tuples of the form (field_name, pyarrow_type), 
>>> e.g. ('field1', pyarrow.string()). What should pyarrow_type be for a 
>>> SparseCSRMatrix field? Or will this not work?
>>> 
>>> Thanks,
>>> David
>>> 
>>> 
>>> 
>>> On 7/1/2022 9:18 AM, Rok Mihevc wrote:
>>>> We lack pyarow sparse tensor documentation (PRs welcome), so tests are 
>>>> perhaps most extensive description of what is doable: 
>>>> https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py
>>>>  
>>>> 
>>>> Rok
>>>> 
>>>> On Fri, Jul 1, 2022 at 5:38 PM dl via user <user@arrow.apache.org> wrote:
>>>>> So, I guess this is supported in 8.0.0. I can do this:
>>>>> 
>>>>> *import *numpy *as *np
>>>>> *import *pyarrow *as *pa
>>>>> *from *scipy.sparse *import *csr_matrix
>>>>> 
>>>>> a = np.random.rand(100)
>>>>> a[a < .9] = 0.0
>>>>> s = csr_matrix(a)
>>>>> arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)
>>>>> 
>>>>> Now, how do I use that to build a pyarrow table? Stay tuned...
>>>>> 
>>>>> 
>>>>> On 7/1/2022 8:19 AM, dl wrote:
>>>>>> I find pyarrow.SparseCSRMatrix mentioned here 
>>>>>> <https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix>.
>>>>>>  But how do I use that? Is there documentation for that class?
>>>>>> 
>>>>>> 
>>>>>> On 7/1/2022 7:47 AM, dl wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I'm trying to understand support for sparse tensors in Arrow. It looks 
>>>>>>> like there is "experimental" support using the C++ API 
>>>>>>> <https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-tensors>.
>>>>>>>  When was this introduced? I see in the code base here 
>>>>>>> <https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi> 
>>>>>>> Cython sparse array classes. Can these be accessed using the Python 
>>>>>>> API. Are they included in the 8.0.0 release? Is there any other support 
>>>>>>> for sparse arrays/tensors in the Python API? Are there good examples 
>>>>>>> for any of this, in particular for using the 8.0.0 Python API to create 
>>>>>>> sparse tensors?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> David
>>>>>>> 
>>>>>>> 

Reply via email to