Github user vrilleup commented on a diff in the pull request:
https://github.com/apache/spark/pull/964#discussion_r13903174
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala
---
@@ -220,16 +247,43 @@ class RowMatrix(
}
/**
- * Computes the singular value decomposition of this matrix.
- * Denote this matrix by A (m x n), this will compute matrices U, S, V
such that A = U * S * V'.
+ * Computes the singular value decomposition of this matrix, using
default tolerance (1e-9).
*
- * There is no restriction on m, but we require `n^2` doubles to fit in
memory.
- * Further, n should be less than m.
+ * @param k number of singular values to keep. We might return less than
k if there are
+ * numerically zero singular values. See rCond.
+ * @param computeU whether to compute U
+ * @param rCond the reciprocal condition number. All singular values
smaller than rCond * sigma(0)
+ * are treated as zero, where sigma(0) is the largest
singular value.
+ * @return SingularValueDecomposition(U, s, V)
+ */
+ def computeSVD(
+ k: Int,
+ computeU: Boolean = false,
+ rCond: Double = 1e-9): SingularValueDecomposition[RowMatrix, Matrix]
= {
+ if (numCols() < 100) {
--- End diff --
hi Reza, you are right. This looks like a magic number. As Xiangrui
suggested, we could separate the API for dense and sparse implementations. The
magic number would be gone.
Date: Tue, 17 Jun 2014 23:10:03 -0700
From: [email protected]
To: [email protected]
CC: [email protected]
Subject: Re: [spark] SPARK-1782: svd for sparse matrix using ARPACK (#964)
In
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala:
> *
> - * There is no restriction on m, but we require `n^2` doubles to fit
in memory.
> - * Further, n should be less than m.
> + * @param k number of singular values to keep. We might return less
than k if there are
> + * numerically zero singular values. See rCond.
> + * @param computeU whether to compute U
> + * @param rCond the reciprocal condition number. All singular values
smaller than rCond * sigma(0)
> + * are treated as zero, where sigma(0) is the largest
singular value.
> + * @return SingularValueDecomposition(U, s, V)
> + */
> + def computeSVD(
> + k: Int,
> + computeU: Boolean = false,
> + rCond: Double = 1e-9): SingularValueDecomposition[RowMatrix,
Matrix] = {
> + if (numCols() < 100) {
Please add quick comment on why 100 is used, to an unfamiliar person it
looks like a magic number.
â
Reply to this email directly or view it on GitHub.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---