That sounds like the case David Li is describing. You can use
SparseCSRMatrix as a field value, but you have to introduce an extension
type for it [1]. Best see David's suggestion.

[1]: https://arrow.apache.org/docs/python/extending_types.html

On Thu, Jul 7, 2022 at 3:58 PM dl <dydx...@yahoo.com> wrote:

> Thanks. That helps.
>
> Can SparseCSRMatrix be used the way I'm trying to use it, as a field value
> in a table? I think that would need a DataType associated with it to give
> the field.
>
> On 7/6/2022 6:25 PM, Rok Mihevc wrote:
>
> arrow_sparse_csr_matrix.to_numpy() - will return underlying csr components
> arrow_sparse_csr_matrix.to_tensor().to_numpy() - should return a dense
> version of original matrix
>
> On Thu, Jul 7, 2022 at 3:12 AM dl <dydx...@yahoo.com> wrote:
>
>> Minor separate question. The method pyarrow.SparseCSRMatrix.to_numpy()
>> doesn't seem to preserve the shape of the matrix. Am I wrong? For example
>> using the code from my original message, printing the result of
>> arrow_sparse_csr_matrix.to_numpy() in one case gives:
>>
>> (array([[0.91263427],
>>        [0.98520395],
>>        [0.98082576],
>>        [0.97490447],
>>        [0.94312307],
>>        [0.90573414],
>>        [0.95057244],
>>        [0.94955576],
>>        [0.90342821]]), array([0, 9], dtype=int64), array([ 0,  4, 33, 38,
>> 46, 49, 61, 64, 83], dtype=int64))
>>
>> vs.
>>
>> >>> acsr.shape
>> (1, 100)
>>
>>
>> On 7/6/2022 4:01 PM, dl wrote:
>>
>> I have tabular data with one record field of type
>> scipy.sparse.csr_matrix. I want to convert this tabular data to a pyarrow
>> table. I had been first converting the csr_matrix first to a custom
>> representation using three fields (shape, keys, indices) and building the
>> pyarrow table using a schema with the types of these fields and table data
>> with a separate list for each field (and each list having one entry per
>> input record). I was hoping I could use a single pyarrow.SparseCSRMatrix
>> field  instead of the custom three field representation. Is that possible?
>> Incidentally, the shape of the csr_matrix is typically (1,N) where N may
>> vary for different records. But I don't think "typically (1,N)" matters. It
>> would work with variable shape (M,N). The shape field has type pyarrow.List
>> with value_type = pyarrow.int32().
>>
>> On 7/6/2022 2:53 PM, Rok Mihevc wrote:
>>
>> Hey David,
>>
>> I don't think Table is designed in a way that you could "populate" it
>> with a 2D tensor. It should rather be populated with a collection of equal
>> length arrays.
>> Sparse CSR tensor on the other hand is composed of three arrays (indices,
>> indptr, values) and you need a bit more involved logic to manipulate those
>> than regular arrays. See [1] for memory layout definition.
>>
>> What are you looking to accomplish? What access patterns are you
>> expecting?
>>
>> Rok
>>
>> [1] https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs
>>
>> On Wed, Jul 6, 2022 at 10:48 PM dl <dydx...@yahoo.com> wrote:
>>
>>> Hi Rok,
>>>
>>> What data type would I use for a pyarrow SparseCSRMatrix in a schema? I
>>> need to build a table with rows which include a field of this type. I don't
>>> see a related example in the test module. I'm doing something like:
>>>
>>> schema = pyarrow.schema(fields, metadata=metadata)
>>> table = pyarrow.Table.from_arrays(table_data, schema=schema)
>>>
>>> where fields is a list of tuples of the form (field_name, pyarrow_type),
>>> e.g. ('field1', pyarrow.string()). What should pyarrow_type be for a
>>> SparseCSRMatrix field? Or will this not work?
>>>
>>> Thanks,
>>> David
>>>
>>>
>>> On 7/1/2022 9:18 AM, Rok Mihevc wrote:
>>>
>>> We lack pyarow sparse tensor documentation (PRs welcome), so tests are
>>> perhaps most extensive description of what is doable:
>>> https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py
>>>
>>> Rok
>>>
>>> On Fri, Jul 1, 2022 at 5:38 PM dl via user <user@arrow.apache.org>
>>> wrote:
>>>
>>>> So, I guess this is supported in 8.0.0. I can do this:
>>>>
>>>> import numpy as npimport pyarrow as pafrom scipy.sparse import csr_matrix
>>>>
>>>> a = np.random.rand(100)
>>>> a[a < .9] = 0.0
>>>> s = csr_matrix(a)
>>>> arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)
>>>>
>>>> Now, how do I use that to build a pyarrow table? Stay tuned...
>>>>
>>>> On 7/1/2022 8:19 AM, dl wrote:
>>>>
>>>> I find pyarrow.SparseCSRMatrix mentioned here
>>>> <https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix>.
>>>> But how do I use that? Is there documentation for that class?
>>>>
>>>> On 7/1/2022 7:47 AM, dl wrote:
>>>>
>>>>
>>>> Hi,
>>>>
>>>> I'm trying to understand support for sparse tensors in Arrow. It looks
>>>> like there is "experimental" support using the C++ API
>>>> <https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-tensors>.
>>>> When was this introduced? I see in the code base here
>>>> <https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi>
>>>> Cython sparse array classes. Can these be accessed using the Python API.
>>>> Are they included in the 8.0.0 release? Is there any other support for
>>>> sparse arrays/tensors in the Python API? Are there good examples for any of
>>>> this, in particular for using the 8.0.0 Python API to create sparse 
>>>> tensors?
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>

Reply via email to