GitHub user ghoto opened a pull request:

    https://github.com/apache/spark/pull/17940

    Bug fix/spark 20687

    ## What changes were proposed in this pull request?
    
    Bugfix for https://issues.apache.org/jira/browse/SPARK-20687
    
    Before converting a CSCMatrix to a Matrix, the trailing buffer 0s added by 
Breeze to rowIndices and data, are removed to avoid inconsistencies with 
colPtrs. Notice that this trailing buffers are often generated after operations 
between matrices such summation or subtraction, and this code causes therefore 
exceptions on valid BlockMatrix.add, and BlockMatrix.substract operations, 
because blocks are stored as SparseMatrix, converted to breeze and back to 
sparse.
    
    
http://stackoverflow.com/questions/33528555/error-thrown-when-using-blockmatrix-add
    
    ## How was this patch tested?
    
    Added a test to MatricesSuite that verifies that the conversions are valid 
and that code doesn't crash. Originally the same code would crash on Spark.
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ghoto/spark bug-fix/SPARK-20687

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17940.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17940
    
----
commit 62d78a241c95d09896b731776e29a8cb883dfc49
Author: Ignacio Bermudez <ignaciobermu...@gmail.com>
Date:   2017-05-10T04:31:03Z

    Reproducing SPARK-20687

commit dbbd39121f3210f6edd7a74bb21853fbda20c0cb
Author: Ignacio Bermudez <ignaciobermu...@gmail.com>
Date:   2017-05-10T18:03:14Z

    [SPARK-20687] mllib.Matrices.fromBreeze may cause crash when converting 
breeze CSCMatrix
    
    In an operation of two A, B CSCMatrices the resulting C matrix may have 
some extra 0s
    in rowIndices and data which are created for performance improvement by 
Breeze.
    This causes problems on converting back to mllib.Matrix because it relies on
    rowIndices and data being coherent with colPtrs. Therefore it is necessary 
to truncate
    rowIndices and data to the active number of elements hold by the C matrix, 
before
    creating a Spark's SparseMatrix.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to