These are very helpful answers guys; thanks!

On May 23, 2013, at 6:26 PM, Ted Dunning <[email protected]> wrote:

> The SVD of a matrix is not unique.  You can change the sign and rearrange
> the singular values at well.  Customary practice is to order by the square
> of the singular value, but that doesn't make the SVD unique.
> 
> Regarding the number of singular values, R's svd routine computes all of
> the singular values.  The nu and nv parameters that you are setting control
> the number of singular VECTORS that are computed, not the number of
> singular VALUES.
> 
> If you want to experiment, here is an in-memory implementation of
> stochastic SVD in R.  This lets you play with various combinations of
> parameters.
> 
> incore = function(A, k, p = 10, q = 2) {
>  if (q > 0) {
>    Z = A
>    for (i in 1:q) {
>      Z = Z %*% t(A) %*% A
>    }
>    A = Z
>  }
>  n = dim(A)[1]
>  m = dim(A)[2]
>  Y = A %*% matrix(rnorm((k+p) * m), ncol=k+p)
>  Q = qr.Q(qr(Y))
>  rm(Y)
>  B = t(Q) %*% A
>  lq = qr(t(B))
>  L = t(qr.R(lq))
>  s = svd(L)
>  U = Q %*% s$u
>  V = qr.Q(lq) %*% s$v
>  return (list(u=U, v=V, d=(s$d^(1/(2*q+1)))))
> }
> 
> 
> In order to produce interesting data for this, I recommend something like
> this:
> 
> A = matrix(rnorm(1000*1000), ncol=1000)
> for (i in 1:100) {A[,i] = i^4 * A[,i]}
> plot(svd(A)$d[1:30])
> A = A/1e9
> 
> The idea here is that you want a range of singular values.
> 
> Using this, you can trade off the padding (p) versus the power iterations
> (q).
> 
> This combination, for instance, give me errors of about 1e-13 versus the
> internal R algorithm.
> 
> s = svd(A)$d[1:20]
> plot(s-incore(A,k=20,p=55,q=1)$d[1:20])
> 
> 
> 
> 
> 
> On Thu, May 23, 2013 at 3:32 PM, Andrew Musselman <
> [email protected]> wrote:
> 
>> Wouldn't I expect to get similar results using Mahout's SSVD vs. R's SVD?
>> 
>> Note the second component of each vector in U and V is the negative of what
>> R gives me.  Also, R includes a third singular value even when I ask it to
>> calculate a rank-2 decomposition.
>> 
>> The output of Mahout's SSVD run on the 3x3 matrix
>> $ cat a
>> 1 (0.0,0.25,0.25)
>> 2 (0.75,0.0,0.25)
>> 3 (0.25,0.75,0.5)
>> 
>> $ mahout ssvd -k 2 -p 1 -q 1 --input kv-pairs --output ssvd-out --tempDir
>> tmp-ssvd-2 --reduceTasks 1
>> $ mahout seqdumper -i ssvd-out/U -o ssvd-dump-U -b 200
>> $ mahout seqdumper -i ssvd-out/V -o ssvd-dump-V -b 200
>> $ mahout seqdumper -i ssvd-out/sigma -o ssvd-dump-sigma -b 200
>> 
>> $ cat ssvd-dump-U; cat ssvd-dump-V; cat ssvd-dump-sigma
>> Input Path: hdfs://localhost:9010/user/akm/ssvd-out/U/part-m-00000
>> Key class: class org.apache.hadoop.io.IntWritable Value Class: class
>> org.apache.mahout.math.VectorWritable
>> Key: 1: Value: {0:-0.27511654723856177,1:-0.2590650410646752}
>> Key: 2: Value: {0:-0.5012740900141649,1:0.8604052567841447}
>> Key: 3: Value: {0:-0.8203872086496734,1:-0.43884860555363264}
>> Count: 3
>> Input Path: hdfs://localhost:9010/user/akm/ssvd-out/V/part-m-00000
>> Key class: class org.apache.hadoop.io.IntWritable Value Class: class
>> org.apache.mahout.math.VectorWritable
>> Key: 0: Value: {0:-0.5370130951532543,1:0.8012749902922572}
>> Key: 1: Value: {0:-0.6322223639715111,1:-0.5893002821703531}
>> Key: 2: Value: {0:-0.5584906607349807,1:-0.10336134367394931}
>> Count: 3
>> Input Path: ssvd-out/sigma
>> Key class: class org.apache.hadoop.io.IntWritable Value Class: class
>> org.apache.mahout.math.VectorWritable
>> Key: 0: Value: {0:1.0820078223739025,1:0.6684244456504859}
>> Count: 1
>> 
>> Versus the output of R's SVD run on the same 3x3 matrix
>>> mp
>>     [,1] [,2] [,3]
>> [1,] 0.00 0.25 0.25
>> [2,] 0.75 0.00 0.25
>> [3,] 0.25 0.75 0.50
>> 
>>> s <- svd(mp,2,2)
>>> s
>> $d
>> [1] 1.08200782 0.66842445 0.08641662
>> 
>> $u
>>           [,1]       [,2]
>> [1,] -0.2751165  0.2590650
>> [2,] -0.5012741 -0.8604053
>> [3,] -0.8203872  0.4388486
>> 
>> $v
>>           [,1]       [,2]
>> [1,] -0.5370131 -0.8012750
>> [2,] -0.6322224  0.5893003
>> [3,] -0.5584907  0.1033613
>> 

Reply via email to