If you're starting with a single (1,N) scipy.csr_matrix and just want to go
to an array you can also:

scipy_csr_matrix = csr_matrix((data, indices, indptr), shape=shape)
sparse_tensor = pa.SparseCSRMatrix.from_scipy(scipy_csr_matrix)
arr = pa.array(sparse_tensor.to_tensor().to_numpy()[0])

But that assumes 1-dimension and goes to dense representation.

On Thu, Jul 7, 2022 at 1:27 AM David Li <lidav...@apache.org> wrote:

> If I'm not mistaken, what you want is basically an extension type [1] for
> tensors, so you can have a column where each row contains a tensor/matrix.
> This has been discussed for quite some time [2].
>
> Incidentally, you can keep the three-field representation but pack it into
> a single toplevel field with the Struct type.
>
> [1]: https://arrow.apache.org/docs/python/extending_types.html
> [2]: https://issues.apache.org/jira/browse/ARROW-1614
>
> On Wed, Jul 6, 2022, at 19:01, dl via user wrote:
>
> I have tabular data with one record field of type scipy.sparse.csr_matrix.
> I want to convert this tabular data to a pyarrow table. I had been first
> converting the csr_matrix first to a custom representation using three
> fields (shape, keys, indices) and building the pyarrow table using a schema
> with the types of these fields and table data with a separate list for each
> field (and each list having one entry per input record). I was hoping I
> could use a single pyarrow.SparseCSRMatrix field  instead of the custom
> three field representation. Is that possible? Incidentally, the shape of
> the csr_matrix is typically (1,N) where N may vary for different records.
> But I don't think "typically (1,N)" matters. It would work with variable
> shape (M,N). The shape field has type pyarrow.List with value_type =
> pyarrow.int32().
>
>
> On 7/6/2022 2:53 PM, Rok Mihevc wrote:
>
> Hey David,
>
> I don't think Table is designed in a way that you could "populate" it with
> a 2D tensor. It should rather be populated with a collection of equal
> length arrays.
> Sparse CSR tensor on the other hand is composed of three arrays (indices,
> indptr, values) and you need a bit more involved logic to manipulate those
> than regular arrays. See [1] for memory layout definition.
>
> What are you looking to accomplish? What access patterns are you expecting?
>
> Rok
>
> [1] https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs
>
> On Wed, Jul 6, 2022 at 10:48 PM dl <dydx...@yahoo.com> wrote:
>
> Hi Rok,
>
> What data type would I use for a pyarrow SparseCSRMatrix in a schema? I
> need to build a table with rows which include a field of this type. I don't
> see a related example in the test module. I'm doing something like:
>
> schema = pyarrow.schema(fields, metadata=metadata)
> table = pyarrow.Table.from_arrays(table_data, schema=schema)
>
> where fields is a list of tuples of the form (field_name, pyarrow_type),
> e.g. ('field1', pyarrow.string()). What should pyarrow_type be for a
> SparseCSRMatrix field? Or will this not work?
>
> Thanks,
> David
>
>
>
> On 7/1/2022 9:18 AM, Rok Mihevc wrote:
>
> We lack pyarow sparse tensor documentation (PRs welcome), so tests are
> perhaps most extensive description of what is doable:
> https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py
>
> Rok
>
> On Fri, Jul 1, 2022 at 5:38 PM dl via user <user@arrow.apache.org> wrote:
>
> So, I guess this is supported in 8.0.0. I can do this:
>
> *import *numpy *as *np*import *pyarrow *as *pa*from *scipy.sparse *import 
> *csr_matrix
>
> a = np.random.rand(100)
> a[a < .9] = 0.0
> s = csr_matrix(a)
> arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)
>
> Now, how do I use that to build a pyarrow table? Stay tuned...
>
>
> On 7/1/2022 8:19 AM, dl wrote:
>
> I find pyarrow.SparseCSRMatrix mentioned here
> <https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix>.
> But how do I use that? Is there documentation for that class?
>
>
> On 7/1/2022 7:47 AM, dl wrote:
>
>
> Hi,
>
> I'm trying to understand support for sparse tensors in Arrow. It looks
> like there is "experimental" support using the C++ API
> <https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-tensors>.
> When was this introduced? I see in the code base here
> <https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi>
> Cython sparse array classes. Can these be accessed using the Python API.
> Are they included in the 8.0.0 release? Is there any other support for
> sparse arrays/tensors in the Python API? Are there good examples for any of
> this, in particular for using the 8.0.0 Python API to create sparse tensors?
>
> Thanks,
> David
>
>
>
>

Reply via email to