The dense random matrices should do somewhat better than the kind of sparse trinary matrix.
On Mon, Sep 3, 2012 at 3:48 PM, Dmitriy Lyubimov <[email protected]> wrote: > There is a question still i guess that we never looked at which is the > effect of different kind of distribution samplers for the Omega matrix > (0-mean uniform vs. normal vs. trinary) on small problems though, > although small matrices is not what Mahout is about. > > On Mon, Sep 3, 2012 at 12:23 PM, Dmitriy Lyubimov <[email protected]> > wrote: > > ok so i ran Ahmed's numbers with optimal, R sequential version (ssvd.R > > on the wiki) and actual MR. Bottom line, both R ssvd and MR are > > producing quite a bit of rubbish for these k,p parameters and the long > > tailed spectrum beyond just 1st singular vector. Obviously it is hard > > to assert the errors since i'd need a lot of runs to compare and each > > run creates different output. > > > > here we go (1st digits of first 4 singular vectors respectively) > > > > optimal > > s$u[1,1:4] > > [1] -0.09866006 -0.13195226 -0.16620985 -0.04921311 > > > > ssvd.R > > s <- ssvd.svd(A,k=30,p=2,qiter=1) > >> ss$u[1,1:4] > > [1] -0.09869577 -0.10368010 0.09543850 0.14122410 > > > > MR SSVD > >> U[1,1:4] > > V2 V3 V4 V5 > > -0.09865305 -0.21822886 0.05959496 0.06561257 > > > > Of course both versions of SSVD produce exact result with k+p=100 > > which means internal flow is correct. So i don't see necessary > > discrepancies in the results between MR and ssvd.R in that fairly > > corner case of things. > > > > So this just means that our choice of k,p and q is just too poor for > > such a flat spectrum and such a small matrix beyond the first singular > > vector. Can we do better? Sure. As mentioned, k=30,p=70 gives us exact > > optimal solution: > > > >> ss <- ssvd.svd(A,k=30,p=70,qiter=0) > >> ss$u[1,1:4] > > [1] -0.09866006 0.13195226 0.16620985 -0.04921311 > >> > > > > So this is a rebuttal on the Ahmed's statement > >> I tried multiple values for the parameters P and Q but, that does not > seem to solve > >> the problem. > > > > No, it kinda does as demonstrated. > > > > Next. Can we try something in between to work such a bad case to get a > > better trade-off between errors and effort on precision side? sure. > > > > First, It doesn't make sense to use p<15. in fact p=15 is the default > > one for MR version. So: if you ran MR version with default parameters, > > it should have been something closer to this: > > > > (I'll just run algorithm several times for a more consistent > > assessment of error mean and variance) > >> ss <- ssvd.svd(A,k=30,p=15,qiter=1) > >> ss$u[1,1:4] > > [1] -0.09867691 -0.15782129 -0.14318662 -0.01915917 > > [1] -0.09868661 -0.16606151 -0.14881755 0.06277614 > > [1] -0.09865008 -0.18475458 0.10799953 0.03643756 > > [1] -0.09862877 0.16593100 0.15187904 -0.04660871 > > ... > > So this default set of parameters is quite an improvement although we > > probably still get enough error to the right hand of 4th singular > > vector. > > (i assume vector sign is not important, it means sign of V is flipped > > as well i guess). > > > > > > Finally, we can further try to improve with more power iterations. > > ss <- ssvd.svd(A,k=30,p=15,qiter=2) > > ss$u[1,1:4] > > > > > >> ss <- ssvd.svd(A,k=30,p=15,qiter=2) > >> ss$u[1,1:4] > > [1] -0.09866002 -0.13100879 0.16691754 -0.04754525 > > [1] -0.09866011 -0.13027399 -0.16417364 -0.04453192 > > [1] -0.09866004 -0.13453556 -0.16704920 -0.04517045 > > [1] -0.09866006 -0.12905608 0.16230813 -0.04353147 > > > > Now that adds even more stability in the 4th vector, doesn't it? The > > error mean is demonstrably decreased. > > > > Bottom line points. > > > > First, there's not much point to investigate corner cases -- small > > matrices, flat spectrum... You may force SSVD to produce exact > > solution though, but it kind of defeats the purpose, because you can > > as easily run full rank SVD on a matrix in matlab. Your parameters > > chosen still produce acceptable results for the first vector where > > spectrum is trendy but less so where the spectrum is flat. > > > > Second, you can control your errors through oversampling (p) and power > > iterations (q) and you get much flexibility here. You balance > > precision with flops tradeoff. > > > > Third, in real life it doesn't make sense to have high accuracy > > measuring lfat spectrum. Flat spectrum just means you don't have > > trends in those directions, i.e. essentially a random noise. If you > > have random noise, direction of that noise is usually of little > > interest, but because spectrum (i.e. singular values) is measured > > better with the method, you just can state that you have mostly noise > > after certain n-th singular vector and chances are in some cases you > > just wouldn't care. And if you do care, you still have a choice to > > work the oversampling and perhaps q iterations. > > > > Fourth, unless k+p=rank(A), each run will produce slightly different > > results and errors so it doesn't make sense to make any error > > comparison just between single runs of the variations. Instead, it > > makes sense to compare error mean and variations on a better number of > > runs. > > > > -d > > > > On Sun, Sep 2, 2012 at 12:00 AM, Ted Dunning <[email protected]> > wrote: > >> Did Ahmed even use a power iteration? > >> > >> On Sun, Sep 2, 2012 at 1:35 AM, Dmitriy Lyubimov <[email protected]> > wrote: > >> > >>> but there is still a concern in a sense that power iterations > >>> should've helped more than they did. I'll take a closer look but it > >>> will take me a while to figure if there's something we can improve > >>> here. > >>> >
