[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

vrilleup Wed, 18 Jun 2014 00:51:08 -0700

Github user vrilleup commented on the pull request:

    https://github.com/apache/spark/pull/964#issuecomment-46405519
  
    Hi Xiangrui,
    Thank you for the comments! For the API, I think separating svd and svds 
would be a better design. The user should choose which implementation (dense or 
sparse) to use based on the application. For svd, we can keep the old 
signature. For svds I think it's necessary to expose tolerance and 
max_iterations. There could be 2 computeSVD: computeSVD(k, computeU) // 0 < k 
<= ncomputeSVD(k, computeU, rCond)and 2 computeSparseSVD: computeSparseSVD(k, 
computeU) // 0 < k <ncomputeSparseSVD(k, computeU, rCond, tol, maxIterations)
    This will simplify invokes from Java. What do you think?
    One related API is the MatrixFactorizationModel in recommendation. It has 
RDD[(Int, Array[Double])] type for both users and products. Since will be able 
to decompose large scale matrices with sparse svd, the Int index might be a 
limitation. Would it be possible to change to RDD[(Long, Array[Double])]? or 
more generally RDD[(UType, Array[Double])] and RDD[(PType, Array[Double])]
    Another question is about the computeSVD function in IndexedRowMatrix. 
There the indices are directly zipped with computed U. What if the order of 
rows of U changed when computing the SVD? Is preserving the ordering part of 
Spark API contract? 
    Cheers,Li
    Date: Tue, 17 Jun 2014 22:55:58 -0700
    From: [email protected]
    To: [email protected]
    CC: [email protected]
    Subject: Re: [spark] SPARK-1782: svd for sparse matrix using ARPACK (#964)
    
    @vrilleup Thanks for updating the PR! I made a comment on the explicit type 
checks. I'm a little confused about the new API. If isDenseSVD is true, tol 
doesn't mean anything. In MATLAB and SciPy, svds and svd are separate 
functions. We should either do the same or hide the parameters from users. 
Having computeSVD(k, computeU) should be sufficient for most users. What do you 
think?
    
    
    â
    Reply to this email directly or view it on GitHub.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

Reply via email to