Alex Favaro created SPARK-31122:
-----------------------------------

             Summary: Add support for sparse matrix multiplication
                 Key: SPARK-31122
                 URL: https://issues.apache.org/jira/browse/SPARK-31122
             Project: Spark
          Issue Type: New Feature
          Components: MLlib
    Affects Versions: 2.4.5
            Reporter: Alex Favaro


MLlib does not currently support multiplication of sparse matrices. When 
multiplying block matrices with sparse blocks, the sparse blocks are first 
converted to dense matrices. This leads to large increases in memory 
utilization for certain problems.

I'd like to propose adding support for local sparse matrix multiplication to 
MLlib, as well as local dense-sparse matrix multiplication. With these changes, 
the case clause which converts sparse blocks to dense matrices in the block 
matrix multiply method could be removed.

One question is whether the result of sparse-sparse matrix multiplication 
should be sparse or dense, since the product of two sparse matrices can be 
quite dense depending on the matrices. I propose returning a sparse matrix, 
however, and letting the application convert the result to a dense matrix if 
necessary. There is some precedent for this with the block matrix add method, 
which returns sparse matrix blocks even when adding a sparse matrix block to a 
dense matrix block.

As for the implementation, one option would be to leverage Breeze's existing 
sparse matrix multiplication, as MLlib currently does for matrix addition. 
Another would be to add support for sparse-sparse multiplication to the BLAS 
wrapper, which would be consistent with the sparse-dense multiplication 
implementation and could support a more efficient routine for transposed 
matrices (as Breeze does not support transposed matrices). The exact algorithm 
would follow that laid out in ["Sparse Matrix Multiplication Package 
(SMMP)"|https://www.i2m.univ-amu.fr/perso/abdallah.bradji/multp_sparse.pdf].

This would likely not be a huge change but would take some time to test and 
benchmark properly, so before I put up a code diff I would be curious to know:
* Is there any interest in supporting this functionality in MLlib?
* Is there a preference for the return type of sparse-sparse multiplication? 
(i.e. sparse or dense)
* Is there a preference for the implementation? (Breeze vs a built-in one)

Some tickets which include related functionality or identified this particular 
issue but never solved it: SPARK-16820, SPARK-3418.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to