Hm. there is slight error between R full rank SVD and Mahout MR SSVD
for my unit test modified for 100x100 k= 3 p=10.
First left vector (R/SSVD) :
> s$u[,1]
[1] -0.050741660 -0.083985411 0.078767108 -0.044487425 -0.010380367
[6] 0.069635451 0.158337400 0.029102044 -0.168156173 -0.127921554
[11] 0.012698809 -0.027140724 0.069357925 -0.015605283 0.076614201
[16] -0.158582188 0.143656275 0.033886221 -0.055111330 -0.029299261
[21] 0.059667350 0.039205405 0.042027376 0.048541162 0.158267382
[26] -0.045441433 0.044529295 -0.038681358 -0.024035611 -0.054543123
[31] 0.027365365 -0.054029635 -0.021845631 0.053124795 0.050475680
[36] -0.093776477 0.094699229 -0.030911885 -0.169810667 0.149075410
[41] 0.102150407 0.165651229 0.175798233 -0.048390507 0.175243690
[46] -0.170793896 0.059918820 -0.132466003 -0.131783388 -0.178422266
[51] 0.079304233 -0.054428953 0.057820900 0.120791505 0.095287617
[56] 0.036671894 -0.081203386 0.153768112 0.014849405 0.027470798
[61] -0.064944829 -0.007538214 0.069034637 -0.133978151 -0.022290433
[66] -0.038094067 0.168947231 -0.100797474 -0.054253041 -0.040255069
[71] 0.124817481 -0.059689202 0.018821181 -0.131237426 -0.141223359
[76] 0.128026731 -0.170388319 0.080445852 0.071966615 -0.029745918
[81] 0.049479520 -0.121362268 -0.077338205 -0.061950828 -0.168851635
[86] -0.073192796 0.087453086 -0.085166577 0.160026655 -0.060816556
[91] 0.015420973 0.117780809 0.083415819 -0.160806975 0.171932591
[96] 0.170064367 0.001479280 -0.161878123 0.129685305 -0.104231610
> U[,1]
1 2 3 4 5 6
0.050741634 0.083985464 -0.078767344 0.044487660 0.010380470 -0.069635561
7 8 9 10 11 12
-0.158337117 -0.029102012 0.168156073 0.127921760 -0.012698756 0.027140487
13 14 15 16 17 18
-0.069358074 0.015605295 -0.076614050 0.158582091 -0.143656127 -0.033886485
19 20 21 22 23 24
0.055111560 0.029299084 -0.059667201 -0.039205182 -0.042027356 -0.048541087
25 26 27 28 29 30
-0.158267335 0.045441521 -0.044529241 0.038681577 0.024035604 0.054543106
31 32 33 34 35 36
-0.027365256 0.054029674 0.021845620 -0.053124833 -0.050475677 0.093776656
37 38 39 40 41 42
-0.094699463 0.030911730 0.169810791 -0.149075076 -0.102150266 -0.165651017
43 44 45 46 47 48
-0.175798375 0.048390265 -0.175243708 0.170793758 -0.059918703 0.132465938
49 50 51 52 53 54
0.131783579 0.178422152 -0.079304282 0.054428751 -0.057820999 -0.120791565
55 56 57 58 59 60
-0.095287586 -0.036671995 0.081203324 -0.153767938 -0.014849361 -0.027471027
61 62 63 64 65 66
0.064944979 0.007538413 -0.069034788 0.133978044 0.022290513 0.038094051
67 68 69 70 71 72
-0.168947352 0.100797649 0.054253165 0.040255237 -0.124817480 0.059689502
73 74 75 76 77 78
-0.018821295 0.131237429 0.141223597 -0.128027116 0.170388135 -0.080445760
79 80 81 82 83 84
-0.071966482 0.029745819 -0.049479559 0.121362303 0.077338278 0.061950724
85 86 87 88 89 90
0.168851648 0.073193002 -0.087453189 0.085166809 -0.160026464 0.060816590
91 92 93 94 95 96
-0.015421147 -0.117780975 -0.083415727 0.160806958 -0.171932343 -0.170064514
97 98 99 100
-0.001479434 0.161878089 -0.129685379 0.104231530
Same thing for the right singular vectors. The only thing is that they
seem to change the sign between R and Mahout's version but otherwise
they fit more or less exactly.
So yeah i am seeing some stochastic effects in these for k and p being
so low -- so are you saying your errors are greater than those? I did
not test sequential version with similar parameters.
One significant difference between MR and sequential version is that
sequential version is using ternary random matrix (instead of uniform
one), perhaps that may affect accuracy a little bit.
On Fri, Aug 31, 2012 at 10:55 PM, Ted Dunning <[email protected]> wrote:
> Can you provide your test code?
>
> What difference did you observe?
>
> Did you account for the fact that your matrix is small enough that it
> probably wasn't divided correctly?
>
> On Sat, Sep 1, 2012 at 1:27 AM, Ahmed Elgohary <[email protected]> wrote:
>
>> Hi,
>>
>> I used mahout's stochastic svd implementation to find the singular vectors
>> and the singular vectors of a small matrix 99x100. Then, I compared the
>> results to the singular values and the singular vectors obtained using the
>> svd function in matlab and the single threaded version of the ssvd. I got
>> pretty much the same singular values using the 3 implementations. however,
>> the singular vectors of mahout's ssvd were significantly different. I tried
>> multiple values for the parameters P and Q but, that does not seem to solve
>> the problem. Does MR implementation of the SSVD do extra approximations
>> over the single threaded ssvd so their results might not be the same? Any
>> advice how I can tune mahout's ssvd to get the same singular vectors of the
>> single threaded ssvd?
>>
>> thanks,
>>
>> --ahmed
>>