The SSVD is a partial svd, so you won't get all 3 singular values. The randomized algorithm implemented in SSVD is only efficient when you need a few singular values (cf http://math.berkeley.edu/~strain/273.F10/martinsson.tygert.rokhlin.randomized.decomposition.pdf and http://arxiv.org/abs/0909.4061 ). A full SVD on mahout-sized data is probably something you don't want to do anyway.
The SVD is only unique up to a sign (cf http://en.wikipedia.org/wiki/Singular_value_decomposition ), so the results you're getting are indeed similar. On Thu, May 23, 2013 at 3:32 PM, Andrew Musselman <[email protected]> wrote: > Wouldn't I expect to get similar results using Mahout's SSVD vs. R's SVD? > > Note the second component of each vector in U and V is the negative of what > R gives me. Also, R includes a third singular value even when I ask it to > calculate a rank-2 decomposition. > > The output of Mahout's SSVD run on the 3x3 matrix > $ cat a > 1 (0.0,0.25,0.25) > 2 (0.75,0.0,0.25) > 3 (0.25,0.75,0.5) > > $ mahout ssvd -k 2 -p 1 -q 1 --input kv-pairs --output ssvd-out --tempDir > tmp-ssvd-2 --reduceTasks 1 > $ mahout seqdumper -i ssvd-out/U -o ssvd-dump-U -b 200 > $ mahout seqdumper -i ssvd-out/V -o ssvd-dump-V -b 200 > $ mahout seqdumper -i ssvd-out/sigma -o ssvd-dump-sigma -b 200 > > $ cat ssvd-dump-U; cat ssvd-dump-V; cat ssvd-dump-sigma > Input Path: hdfs://localhost:9010/user/akm/ssvd-out/U/part-m-00000 > Key class: class org.apache.hadoop.io.IntWritable Value Class: class > org.apache.mahout.math.VectorWritable > Key: 1: Value: {0:-0.27511654723856177,1:-0.2590650410646752} > Key: 2: Value: {0:-0.5012740900141649,1:0.8604052567841447} > Key: 3: Value: {0:-0.8203872086496734,1:-0.43884860555363264} > Count: 3 > Input Path: hdfs://localhost:9010/user/akm/ssvd-out/V/part-m-00000 > Key class: class org.apache.hadoop.io.IntWritable Value Class: class > org.apache.mahout.math.VectorWritable > Key: 0: Value: {0:-0.5370130951532543,1:0.8012749902922572} > Key: 1: Value: {0:-0.6322223639715111,1:-0.5893002821703531} > Key: 2: Value: {0:-0.5584906607349807,1:-0.10336134367394931} > Count: 3 > Input Path: ssvd-out/sigma > Key class: class org.apache.hadoop.io.IntWritable Value Class: class > org.apache.mahout.math.VectorWritable > Key: 0: Value: {0:1.0820078223739025,1:0.6684244456504859} > Count: 1 > > Versus the output of R's SVD run on the same 3x3 matrix >> mp > [,1] [,2] [,3] > [1,] 0.00 0.25 0.25 > [2,] 0.75 0.00 0.25 > [3,] 0.25 0.75 0.50 > >> s <- svd(mp,2,2) >> s > $d > [1] 1.08200782 0.66842445 0.08641662 > > $u > [,1] [,2] > [1,] -0.2751165 0.2590650 > [2,] -0.5012741 -0.8604053 > [3,] -0.8203872 0.4388486 > > $v > [,1] [,2] > [1,] -0.5370131 -0.8012750 > [2,] -0.6322224 0.5893003 > [3,] -0.5584907 0.1033613
