On Thu, May 23, 2013 at 3:32 PM, Andrew Musselman <
[email protected]> wrote:

> Wouldn't I expect to get similar results using Mahout's SSVD vs. R's SVD?
>
> Note the second component of each vector in U and V is the negative of what
> R gives me.


SVD may have multiple symmetric solutions. You can also see that
potentially when ran thru any other library.

 Also, R includes a third singular value even when I ask it to
> calculate a rank-2 decomposition.
>

If you do reduced rank, you get what you ask for. R always does full rank
it seems but it doesn't mean R is right in showing you more.


>
> The output of Mahout's SSVD run on the 3x3 matrix
> $ cat a
> 1 (0.0,0.25,0.25)
> 2 (0.75,0.0,0.25)
> 3 (0.25,0.75,0.5)
>
> $ mahout ssvd -k 2 -p 1 -q 1 --input kv-pairs --output ssvd-out --tempDir
> tmp-ssvd-2 --reduceTasks 1
> $ mahout seqdumper -i ssvd-out/U -o ssvd-dump-U -b 200
> $ mahout seqdumper -i ssvd-out/V -o ssvd-dump-V -b 200
> $ mahout seqdumper -i ssvd-out/sigma -o ssvd-dump-sigma -b 200
>
> $ cat ssvd-dump-U; cat ssvd-dump-V; cat ssvd-dump-sigma
> Input Path: hdfs://localhost:9010/user/akm/ssvd-out/U/part-m-00000
> Key class: class org.apache.hadoop.io.IntWritable Value Class: class
> org.apache.mahout.math.VectorWritable
> Key: 1: Value: {0:-0.27511654723856177,1:-0.2590650410646752}
> Key: 2: Value: {0:-0.5012740900141649,1:0.8604052567841447}
> Key: 3: Value: {0:-0.8203872086496734,1:-0.43884860555363264}
> Count: 3
> Input Path: hdfs://localhost:9010/user/akm/ssvd-out/V/part-m-00000
> Key class: class org.apache.hadoop.io.IntWritable Value Class: class
> org.apache.mahout.math.VectorWritable
> Key: 0: Value: {0:-0.5370130951532543,1:0.8012749902922572}
> Key: 1: Value: {0:-0.6322223639715111,1:-0.5893002821703531}
> Key: 2: Value: {0:-0.5584906607349807,1:-0.10336134367394931}
> Count: 3
> Input Path: ssvd-out/sigma
> Key class: class org.apache.hadoop.io.IntWritable Value Class: class
> org.apache.mahout.math.VectorWritable
> Key: 0: Value: {0:1.0820078223739025,1:0.6684244456504859}
> Count: 1
>
> Versus the output of R's SVD run on the same 3x3 matrix
> > mp
>      [,1] [,2] [,3]
> [1,] 0.00 0.25 0.25
> [2,] 0.75 0.00 0.25
> [3,] 0.25 0.75 0.50
>
> > s <- svd(mp,2,2)
> > s
> $d
> [1] 1.08200782 0.66842445 0.08641662
>
> $u
>            [,1]       [,2]
> [1,] -0.2751165  0.2590650
> [2,] -0.5012741 -0.8604053
> [3,] -0.8203872  0.4388486
>
> $v
>            [,1]       [,2]
> [1,] -0.5370131 -0.8012750
> [2,] -0.6322224  0.5893003
> [3,] -0.5584907  0.1033613
>

Reply via email to