There is a question still i guess that we never looked at which is the effect of different kind of distribution samplers for the Omega matrix (0-mean uniform vs. normal vs. trinary) on small problems though, although small matrices is not what Mahout is about.
On Mon, Sep 3, 2012 at 12:23 PM, Dmitriy Lyubimov <[email protected]> wrote: > ok so i ran Ahmed's numbers with optimal, R sequential version (ssvd.R > on the wiki) and actual MR. Bottom line, both R ssvd and MR are > producing quite a bit of rubbish for these k,p parameters and the long > tailed spectrum beyond just 1st singular vector. Obviously it is hard > to assert the errors since i'd need a lot of runs to compare and each > run creates different output. > > here we go (1st digits of first 4 singular vectors respectively) > > optimal > s$u[1,1:4] > [1] -0.09866006 -0.13195226 -0.16620985 -0.04921311 > > ssvd.R > s <- ssvd.svd(A,k=30,p=2,qiter=1) >> ss$u[1,1:4] > [1] -0.09869577 -0.10368010 0.09543850 0.14122410 > > MR SSVD >> U[1,1:4] > V2 V3 V4 V5 > -0.09865305 -0.21822886 0.05959496 0.06561257 > > Of course both versions of SSVD produce exact result with k+p=100 > which means internal flow is correct. So i don't see necessary > discrepancies in the results between MR and ssvd.R in that fairly > corner case of things. > > So this just means that our choice of k,p and q is just too poor for > such a flat spectrum and such a small matrix beyond the first singular > vector. Can we do better? Sure. As mentioned, k=30,p=70 gives us exact > optimal solution: > >> ss <- ssvd.svd(A,k=30,p=70,qiter=0) >> ss$u[1,1:4] > [1] -0.09866006 0.13195226 0.16620985 -0.04921311 >> > > So this is a rebuttal on the Ahmed's statement >> I tried multiple values for the parameters P and Q but, that does not seem >> to solve >> the problem. > > No, it kinda does as demonstrated. > > Next. Can we try something in between to work such a bad case to get a > better trade-off between errors and effort on precision side? sure. > > First, It doesn't make sense to use p<15. in fact p=15 is the default > one for MR version. So: if you ran MR version with default parameters, > it should have been something closer to this: > > (I'll just run algorithm several times for a more consistent > assessment of error mean and variance) >> ss <- ssvd.svd(A,k=30,p=15,qiter=1) >> ss$u[1,1:4] > [1] -0.09867691 -0.15782129 -0.14318662 -0.01915917 > [1] -0.09868661 -0.16606151 -0.14881755 0.06277614 > [1] -0.09865008 -0.18475458 0.10799953 0.03643756 > [1] -0.09862877 0.16593100 0.15187904 -0.04660871 > ... > So this default set of parameters is quite an improvement although we > probably still get enough error to the right hand of 4th singular > vector. > (i assume vector sign is not important, it means sign of V is flipped > as well i guess). > > > Finally, we can further try to improve with more power iterations. > ss <- ssvd.svd(A,k=30,p=15,qiter=2) > ss$u[1,1:4] > > >> ss <- ssvd.svd(A,k=30,p=15,qiter=2) >> ss$u[1,1:4] > [1] -0.09866002 -0.13100879 0.16691754 -0.04754525 > [1] -0.09866011 -0.13027399 -0.16417364 -0.04453192 > [1] -0.09866004 -0.13453556 -0.16704920 -0.04517045 > [1] -0.09866006 -0.12905608 0.16230813 -0.04353147 > > Now that adds even more stability in the 4th vector, doesn't it? The > error mean is demonstrably decreased. > > Bottom line points. > > First, there's not much point to investigate corner cases -- small > matrices, flat spectrum... You may force SSVD to produce exact > solution though, but it kind of defeats the purpose, because you can > as easily run full rank SVD on a matrix in matlab. Your parameters > chosen still produce acceptable results for the first vector where > spectrum is trendy but less so where the spectrum is flat. > > Second, you can control your errors through oversampling (p) and power > iterations (q) and you get much flexibility here. You balance > precision with flops tradeoff. > > Third, in real life it doesn't make sense to have high accuracy > measuring lfat spectrum. Flat spectrum just means you don't have > trends in those directions, i.e. essentially a random noise. If you > have random noise, direction of that noise is usually of little > interest, but because spectrum (i.e. singular values) is measured > better with the method, you just can state that you have mostly noise > after certain n-th singular vector and chances are in some cases you > just wouldn't care. And if you do care, you still have a choice to > work the oversampling and perhaps q iterations. > > Fourth, unless k+p=rank(A), each run will produce slightly different > results and errors so it doesn't make sense to make any error > comparison just between single runs of the variations. Instead, it > makes sense to compare error mean and variations on a better number of > runs. > > -d > > On Sun, Sep 2, 2012 at 12:00 AM, Ted Dunning <[email protected]> wrote: >> Did Ahmed even use a power iteration? >> >> On Sun, Sep 2, 2012 at 1:35 AM, Dmitriy Lyubimov <[email protected]> wrote: >> >>> but there is still a concern in a sense that power iterations >>> should've helped more than they did. I'll take a closer look but it >>> will take me a while to figure if there's something we can improve >>> here. >>>
