GitHub user hhbyyh opened a pull request:
https://github.com/apache/spark/pull/4200
[Spark-5406][MLlib] LocalLAPACK mode in RowMatrix.computeSVD should have
much smaller upper bound
JIRA link: https://issues.apache.org/jira/browse/SPARK-5406
The code in breeze svd imposes the upper bound.
val workSize = ( 3
* scala.math.min(m, n)
* scala.math.min(m, n)
+ scala.math.max(scala.math.max(m, n), 4 * scala.math.min(m, n)
* scala.math.min(m, n) + 4 * scala.math.min(m, n))
)
val work = new Array[Double](workSize)
As a result, 7 * n * n + 4 * n < Int.MaxValue at least (depends on JVM)
In some worse cases, like n = 25000, work size will become positive again
(80032704) and bring wired behavior.
The PR is only the beginning, to support Genbase
(http://www.paradigm4.com/wp-content/uploads/2014/06/Genomics-Benchmark-Technical-Report.pdf),
which needs to compute svd for matrix up to 60K * 70K. I found many
potential issues and would like to know if there's any plan undergoing that
would expand the range of matrix computation based on Spark.
Thanks.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/hhbyyh/spark rowMatrix
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/4200.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #4200
----
commit e48a6e4f95f0f07e80ee63741fbf2ad546fe5919
Author: Yuhao Yang <[email protected]>
Date: 2015-01-27T07:02:36Z
make latent svd computation constraint clear
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]