On Sun, Sep 2, 2012 at 12:26 AM, Ahmed Elgohary <[email protected]> wrote:
> - I am using k = 30 and p = 2 so (k+p)<99 (Rank(A)) > - I am attaching the csv file of the matrix A > Brilliant. And the attachment actually made it through. > - yes, the difference is significant. Here is the output of the sequential > SSVD: > Indeed. The difference in the sequential case looks very big. Is it possible that you have nearly identical singular values? This could leave the singular vectors under-determined. I just looked at this using R. Since it uses LAPACK just like Matlab, the singular vectors I got are very similar to yours. > s$u[1,1:10] [1] -0.09866006 -0.13195226 0.16620985 0.04921311 -0.18280221 0.11559161 -0.06775417 0.05038383 [9] -0.01597058 0.03498753 But look at the spectrum: > s$d[1:30] [1] 49.944662 5.396441 5.363498 5.162772 5.124997 5.031488 4.952310 4.837590 4.789828 4.718988 [11] 4.673337 4.576722 4.489299 4.464587 4.423230 4.349272 4.233963 4.069206 4.061468 4.028808 [21] 3.977288 3.927101 3.867809 3.771824 3.715605 3.646800 3.522127 3.494618 3.446958 3.412861 After the first singular value, you have lots of nearly identical values. This indicates that you probably need to use a power method to pre-condition the data. In a large-scale application, it is typically a much longer ways before you hit this flat plateau. This makes the SSVD much more economical. Also, if I step through the basic algorithm using R, it is easy to see that the basic assumptions of the stochastic projection are violated. > Y = A %*% matrix(rnorm(3200), 100, 32) > qr1 = qr(Y) > norm(qr.Q(qr1) %*% t(qr.Q(qr1)) %*% A - A) [1] 20.83509 > norm(A) [1] 58.34794 The issue here is that the random projection is not a very good approximation of A. THis happens if the singular value spectrum is still large at the cutoff: > s$d[30:99] [1] 3.41286101 3.38844993 3.32769425 3.27354388 3.20851301 3.16175529 3.08556715 3.00915717 2.95061044 [10] 2.92376037 2.87917470 2.82316676 2.75040129 2.66700067 2.61057954 2.58533645 2.53817660 2.49961198 [19] 2.45114845 2.37234167 2.32895166 2.28347210 2.25993598 2.25411242 2.20475212 2.14491854 2.06434148 [28] 2.05223478 1.98081081 1.94691237 1.92036956 1.83571526 1.75956488 1.69612907 1.67369404 1.58851679 [37] 1.54144415 1.51821081 1.45912647 1.45063011 1.41732925 1.35655483 1.31333389 1.27988382 1.18840667 [46] 1.14900347 1.10145851 1.08105797 1.04054897 0.96239289 0.94764227 0.86479263 0.82724565 0.79683667 [55] 0.71886437 0.68518593 0.63232310 0.57656816 0.57076929 0.51649318 0.43080840 0.39319166 0.37404507 [64] 0.34126503 0.31046386 0.25922183 0.17239248 0.10925979 0.07458005 0.02792311 So this is the basic problem. If we do even just one or two power iterations things look vats better > norm(qr.Q(qr1) %*% t(qr.Q(qr1)) %*% A - A)/norm(A) [1] 0.3570835 > A1 = A %*% t(A) %*% A > Y1 = A1 %*% matrix(rnorm(3200), 100, 32) > qr1.1 = qr(A1) > norm(qr.Q(qr1.1) %*% t(qr.Q(qr1.1)) %*% A1 - A1)/norm(A1) [1] 6.076296e-16 > qr1.1 = qr(Y1) > norm(qr.Q(qr1.1) %*% t(qr.Q(qr1.1)) %*% A1 - A1)/norm(A1) [1] 0.00156115 > A2 = A %*% t(A) %*% A %*% t(A) %*% A > Y2 = A1 %*% matrix(rnorm(3200), 100, 32) > Y2 = A2 %*% matrix(rnorm(3200), 100, 32) > qr1.2 = qr(Y2) > norm(qr.Q(qr1.2) %*% t(qr.Q(qr1.2)) %*% A2 - A2)/norm(A2) [1] 8.318334e-06 But this is just going to help find the larger singular vectors... the reconstruction of A is still going to be problematic. For a more interesting test of the algorithm within the domain it is liable to work, I would recommend starting with a definition of A that has known singular values. Pick U and V at random, engineer S as desired and then try this again. And remember that SSVD isn't intended for small problems. > I am not sure what you mean: > "Did you account for the fact that your matrix is small enough that it > probably wasn't divided correctly?" > I was worried that data might not get well split between mappers. That isn't your problem.
