GitHub user ghoto opened a pull request: https://github.com/apache/spark/pull/17940
Bug fix/spark 20687 ## What changes were proposed in this pull request? Bugfix for https://issues.apache.org/jira/browse/SPARK-20687 Before converting a CSCMatrix to a Matrix, the trailing buffer 0s added by Breeze to rowIndices and data, are removed to avoid inconsistencies with colPtrs. Notice that this trailing buffers are often generated after operations between matrices such summation or subtraction, and this code causes therefore exceptions on valid BlockMatrix.add, and BlockMatrix.substract operations, because blocks are stored as SparseMatrix, converted to breeze and back to sparse. http://stackoverflow.com/questions/33528555/error-thrown-when-using-blockmatrix-add ## How was this patch tested? Added a test to MatricesSuite that verifies that the conversions are valid and that code doesn't crash. Originally the same code would crash on Spark. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ghoto/spark bug-fix/SPARK-20687 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17940.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17940 ---- commit 62d78a241c95d09896b731776e29a8cb883dfc49 Author: Ignacio Bermudez <ignaciobermu...@gmail.com> Date: 2017-05-10T04:31:03Z Reproducing SPARK-20687 commit dbbd39121f3210f6edd7a74bb21853fbda20c0cb Author: Ignacio Bermudez <ignaciobermu...@gmail.com> Date: 2017-05-10T18:03:14Z [SPARK-20687] mllib.Matrices.fromBreeze may cause crash when converting breeze CSCMatrix In an operation of two A, B CSCMatrices the resulting C matrix may have some extra 0s in rowIndices and data which are created for performance improvement by Breeze. This causes problems on converting back to mllib.Matrix because it relies on rowIndices and data being coherent with colPtrs. Therefore it is necessary to truncate rowIndices and data to the active number of elements hold by the C matrix, before creating a Spark's SparseMatrix. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org