GitHub user hhbyyh opened a pull request:

    https://github.com/apache/spark/pull/4200

    [Spark-5406][MLlib] LocalLAPACK mode in RowMatrix.computeSVD should have 
much smaller upper bound

    JIRA link: https://issues.apache.org/jira/browse/SPARK-5406
    
    The code in breeze svd  imposes the upper bound.
          val workSize = ( 3
            * scala.math.min(m, n)
            * scala.math.min(m, n)
            + scala.math.max(scala.math.max(m, n), 4 * scala.math.min(m, n)
              * scala.math.min(m, n) + 4 * scala.math.min(m, n))
          )
          val work = new Array[Double](workSize)
    
    As a result, 7 * n * n + 4 * n < Int.MaxValue at least (depends on JVM)
    
    In some worse cases, like n = 25000, work size will become positive again 
(80032704) and bring wired behavior.
    
    The PR is only the beginning, to support Genbase 
(http://www.paradigm4.com/wp-content/uploads/2014/06/Genomics-Benchmark-Technical-Report.pdf),
 
    which needs to compute svd for matrix up to 60K * 70K. I found many 
potential issues and would like to know if there's any plan undergoing that 
would expand the range of matrix computation based on Spark.
    Thanks.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hhbyyh/spark rowMatrix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4200.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4200
    
----
commit e48a6e4f95f0f07e80ee63741fbf2ad546fe5919
Author: Yuhao Yang <[email protected]>
Date:   2015-01-27T07:02:36Z

    make latent svd computation constraint clear

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to