[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

vrilleup Wed, 18 Jun 2014 00:55:25 -0700

Github user vrilleup commented on a diff in the pull request:

    https://github.com/apache/spark/pull/964#discussion_r13903174
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala 
---
    @@ -220,16 +247,43 @@ class RowMatrix(
       }
     
       /**
    -   * Computes the singular value decomposition of this matrix.
    -   * Denote this matrix by A (m x n), this will compute matrices U, S, V 
such that A = U * S * V'.
    +   * Computes the singular value decomposition of this matrix, using 
default tolerance (1e-9).
        *
    -   * There is no restriction on m, but we require `n^2` doubles to fit in 
memory.
    -   * Further, n should be less than m.
    +   * @param k number of singular values to keep. We might return less than 
k if there are
    +   *          numerically zero singular values. See rCond.
    +   * @param computeU whether to compute U
    +   * @param rCond the reciprocal condition number. All singular values 
smaller than rCond * sigma(0)
    +   *              are treated as zero, where sigma(0) is the largest 
singular value.
    +   * @return SingularValueDecomposition(U, s, V)
    +   */
    +  def computeSVD(
    +      k: Int,
    +      computeU: Boolean = false,
    +      rCond: Double = 1e-9): SingularValueDecomposition[RowMatrix, Matrix] 
= {
    +    if (numCols() < 100) {
    --- End diff --
    
    hi Reza, you are right. This looks like a magic number. As Xiangrui 
suggested, we could separate the API for dense and sparse implementations. The 
magic number would be gone. 
    
    Date: Tue, 17 Jun 2014 23:10:03 -0700
    From: [email protected]
    To: [email protected]
    CC: [email protected]
    Subject: Re: [spark] SPARK-1782: svd for sparse matrix using ARPACK (#964)
    
    In 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala:
    
    >     *
    > -   * There is no restriction on m, but we require `n^2` doubles to fit 
in memory.
    > -   * Further, n should be less than m.
    > +   * @param k number of singular values to keep. We might return less 
than k if there are
    > +   *          numerically zero singular values. See rCond.
    > +   * @param computeU whether to compute U
    > +   * @param rCond the reciprocal condition number. All singular values 
smaller than rCond * sigma(0)
    > +   *              are treated as zero, where sigma(0) is the largest 
singular value.
    > +   * @return SingularValueDecomposition(U, s, V)
    > +   */
    > +  def computeSVD(
    > +      k: Int,
    > +      computeU: Boolean = false,
    > +      rCond: Double = 1e-9): SingularValueDecomposition[RowMatrix, 
Matrix] = {
    > +    if (numCols() < 100) {
    
    Please add quick comment on why 100 is used, to an unfamiliar person it 
looks like a magic number.
    
    
    â
    Reply to this email directly or view it on GitHub.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

Reply via email to