Github user johnc1231 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17459#discussion_r117620801
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrix.scala
---
@@ -108,8 +108,64 @@ class IndexedRowMatrix @Since("1.0.0") (
*/
@Since("1.3.0")
def toBlockMatrix(rowsPerBlock: Int, colsPerBlock: Int): BlockMatrix = {
- // TODO: This implementation may be optimized
- toCoordinateMatrix().toBlockMatrix(rowsPerBlock, colsPerBlock)
+ require(rowsPerBlock > 0,
+ s"rowsPerBlock needs to be greater than 0. rowsPerBlock:
$rowsPerBlock")
+ require(colsPerBlock > 0,
+ s"colsPerBlock needs to be greater than 0. colsPerBlock:
$colsPerBlock")
+
+ val m = numRows()
+ val n = numCols()
+ val lastRowBlockIndex = m / rowsPerBlock
+ val lastColBlockIndex = n / colsPerBlock
+ val lastRowBlockSize = (m % rowsPerBlock).toInt
+ val lastColBlockSize = (n % colsPerBlock).toInt
+ val numRowBlocks = math.ceil(m.toDouble / rowsPerBlock).toInt
+ val numColBlocks = math.ceil(n.toDouble / colsPerBlock).toInt
+
+ val blocks = rows.flatMap { ir: IndexedRow =>
+ val blockRow = ir.index / rowsPerBlock
+ val rowInBlock = ir.index % rowsPerBlock
+
+ ir.vector match {
+ case SparseVector(size, indices, values) =>
+ indices.zip(values).map { case (index, value) =>
+ val blockColumn = index / colsPerBlock
--- End diff --
So it is true that IndexedRowMatrix could have a Long number of rows, but
BlockMatrix is backed by an RDD of ((Int, Int), Matrix), so we're limited by
that. I can just add a check that computes whether it's possible to make a
BlockMatrix from the given IndexedRowMatrix.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]