These are very helpful answers guys; thanks!
On May 23, 2013, at 6:26 PM, Ted Dunning <[email protected]> wrote:
> The SVD of a matrix is not unique. You can change the sign and rearrange
> the singular values at well. Customary practice is to order by the square
> of the singular value, but that doesn't make the SVD unique.
>
> Regarding the number of singular values, R's svd routine computes all of
> the singular values. The nu and nv parameters that you are setting control
> the number of singular VECTORS that are computed, not the number of
> singular VALUES.
>
> If you want to experiment, here is an in-memory implementation of
> stochastic SVD in R. This lets you play with various combinations of
> parameters.
>
> incore = function(A, k, p = 10, q = 2) {
> if (q > 0) {
> Z = A
> for (i in 1:q) {
> Z = Z %*% t(A) %*% A
> }
> A = Z
> }
> n = dim(A)[1]
> m = dim(A)[2]
> Y = A %*% matrix(rnorm((k+p) * m), ncol=k+p)
> Q = qr.Q(qr(Y))
> rm(Y)
> B = t(Q) %*% A
> lq = qr(t(B))
> L = t(qr.R(lq))
> s = svd(L)
> U = Q %*% s$u
> V = qr.Q(lq) %*% s$v
> return (list(u=U, v=V, d=(s$d^(1/(2*q+1)))))
> }
>
>
> In order to produce interesting data for this, I recommend something like
> this:
>
> A = matrix(rnorm(1000*1000), ncol=1000)
> for (i in 1:100) {A[,i] = i^4 * A[,i]}
> plot(svd(A)$d[1:30])
> A = A/1e9
>
> The idea here is that you want a range of singular values.
>
> Using this, you can trade off the padding (p) versus the power iterations
> (q).
>
> This combination, for instance, give me errors of about 1e-13 versus the
> internal R algorithm.
>
> s = svd(A)$d[1:20]
> plot(s-incore(A,k=20,p=55,q=1)$d[1:20])
>
>
>
>
>
> On Thu, May 23, 2013 at 3:32 PM, Andrew Musselman <
> [email protected]> wrote:
>
>> Wouldn't I expect to get similar results using Mahout's SSVD vs. R's SVD?
>>
>> Note the second component of each vector in U and V is the negative of what
>> R gives me. Also, R includes a third singular value even when I ask it to
>> calculate a rank-2 decomposition.
>>
>> The output of Mahout's SSVD run on the 3x3 matrix
>> $ cat a
>> 1 (0.0,0.25,0.25)
>> 2 (0.75,0.0,0.25)
>> 3 (0.25,0.75,0.5)
>>
>> $ mahout ssvd -k 2 -p 1 -q 1 --input kv-pairs --output ssvd-out --tempDir
>> tmp-ssvd-2 --reduceTasks 1
>> $ mahout seqdumper -i ssvd-out/U -o ssvd-dump-U -b 200
>> $ mahout seqdumper -i ssvd-out/V -o ssvd-dump-V -b 200
>> $ mahout seqdumper -i ssvd-out/sigma -o ssvd-dump-sigma -b 200
>>
>> $ cat ssvd-dump-U; cat ssvd-dump-V; cat ssvd-dump-sigma
>> Input Path: hdfs://localhost:9010/user/akm/ssvd-out/U/part-m-00000
>> Key class: class org.apache.hadoop.io.IntWritable Value Class: class
>> org.apache.mahout.math.VectorWritable
>> Key: 1: Value: {0:-0.27511654723856177,1:-0.2590650410646752}
>> Key: 2: Value: {0:-0.5012740900141649,1:0.8604052567841447}
>> Key: 3: Value: {0:-0.8203872086496734,1:-0.43884860555363264}
>> Count: 3
>> Input Path: hdfs://localhost:9010/user/akm/ssvd-out/V/part-m-00000
>> Key class: class org.apache.hadoop.io.IntWritable Value Class: class
>> org.apache.mahout.math.VectorWritable
>> Key: 0: Value: {0:-0.5370130951532543,1:0.8012749902922572}
>> Key: 1: Value: {0:-0.6322223639715111,1:-0.5893002821703531}
>> Key: 2: Value: {0:-0.5584906607349807,1:-0.10336134367394931}
>> Count: 3
>> Input Path: ssvd-out/sigma
>> Key class: class org.apache.hadoop.io.IntWritable Value Class: class
>> org.apache.mahout.math.VectorWritable
>> Key: 0: Value: {0:1.0820078223739025,1:0.6684244456504859}
>> Count: 1
>>
>> Versus the output of R's SVD run on the same 3x3 matrix
>>> mp
>> [,1] [,2] [,3]
>> [1,] 0.00 0.25 0.25
>> [2,] 0.75 0.00 0.25
>> [3,] 0.25 0.75 0.50
>>
>>> s <- svd(mp,2,2)
>>> s
>> $d
>> [1] 1.08200782 0.66842445 0.08641662
>>
>> $u
>> [,1] [,2]
>> [1,] -0.2751165 0.2590650
>> [2,] -0.5012741 -0.8604053
>> [3,] -0.8203872 0.4388486
>>
>> $v
>> [,1] [,2]
>> [1,] -0.5370131 -0.8012750
>> [2,] -0.6322224 0.5893003
>> [3,] -0.5584907 0.1033613
>>