The singular values in my experiments drop like a rock. What is information/probability loss formula when dropping low-value vectors?
That is, I start with a 7D vector set, go through this random projection/svd exercise, and get these singular vectors: [0.7, 0.2, 0.05, 0.02, 0.01, 0.01, 0.01]. I drop the last five to create a matrix that gives 2D coordinates. The sum of the remaining singular values is 0.9. My 3D vectors will be missing 0.10 of *something* compared to the original 7D vectors. What is this something and what other concepts does it plug into? Lance On Sat, Jul 2, 2011 at 11:54 PM, Lance Norskog <[email protected]> wrote: > The singular values on my recommender vectors come out: 90, 10, 1.2, > 1.1, 1.0, 0.95..... This was playing with your R code. Based on this, > I'm adding the QR stuff to my visualization toolkit. > > Lance > > On Sat, Jul 2, 2011 at 10:15 PM, Lance Norskog <[email protected]> wrote: >> All pairwise distances are preserved? There must be a qualifier on >> pairwise distance algorithms. >> >> On Sat, Jul 2, 2011 at 6:49 PM, Lance Norskog <[email protected]> wrote: >>> Cool. The plots are fun. The way gaussian spots project into spinning >>> chains is very educational about entropy. >>> >>> For full Random Projection, a lame random number generator >>> (java.lang.Random) will generate a higher standard deviation than a >>> high-quality one like MurmurHash. >>> >>> On Fri, Jul 1, 2011 at 5:25 PM, Ted Dunning <[email protected]> wrote: >>>> Here is R code that demonstrates what I mean by stunning (aka 15 >>>> significant >>>> figures). Note that I only check dot products here. From the fact that >>>> the >>>> final transform is orthonormal we know that all distances are preserved. >>>> >>>> # make a big random matrix with rank 20 >>>>> a = matrix(rnorm(20000), ncol=20) %*% matrix(rnorm(20000), nrow=20); >>>>> dim(a) >>>> [1] 1000 1000 >>>> # random projection >>>>> y = a %*% matrix(rnorm(30000), ncol=30); >>>> # get basis for y >>>>> q = qr.Q(qr(y)) >>>>> dim(q) >>>> [1] 1000 30 >>>> # re-project b, do svd on result >>>>> b = t(q) %*% a >>>>> v = svd(b)$v >>>>> d = svd(b)$d >>>> # note how singular values drop like a stone at index 21 >>>>> plot(d) >>>>> dim(v) >>>> [1] 1000 30 >>>> # finish svd just for fun >>>>> u = q %*% svd(b)$u >>>>> dim(u) >>>> [1] 1000 30 >>>> # u is orthogonal, right? >>>>> diag(t(u)%*% u) >>>> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 >>>> # and u diag(d) v' reconstructs a very precisely, right? >>>>> max(abs(a-u %*% diag(d) %*% t(v))) >>>> [1] 1.293188e-12 >>>> >>>> # now project a into the reduced dimensional space >>>>> aa = a%*%v >>>>> dim(aa) >>>> [1] 1000 30 >>>> # check a few dot products >>>>> sum(aa[1,] %*% aa[2,]) >>>> [1] 6835.152 >>>>> sum(a[1,] %*% a[2,]) >>>> [1] 6835.152 >>>>> sum(a[1,] %*% a[3,]) >>>> [1] 3337.248 >>>>> sum(aa[1,] %*% aa[3,]) >>>> [1] 3337.248 >>>> >>>> # wow, that's close. let's try a hundred dot products >>>>> dot1 = rep(0,100);dot2 = rep(0,100) >>>>> for (i in 1:100) {dot1[i] = sum(a[1,] * a[i,]); dot2[i] = sum(aa[1,]* >>>> aa[i,])} >>>> >>>> # how close to the same are those? >>>>> max(abs(dot1-dot2)) >>>> # VERY >>>> [1] 3.45608e-11 >>>> >>>> >>>> >>>> On Fri, Jul 1, 2011 at 4:54 PM, Ted Dunning <[email protected]> wrote: >>>> >>>>> Yes. Been there. Done that. >>>>> >>>>> The correlation is stunningly good. >>>>> >>>>> >>>>> On Fri, Jul 1, 2011 at 4:22 PM, Lance Norskog <[email protected]> wrote: >>>>> >>>>>> Thanks! >>>>>> >>>>>> Is this true? - "Preserving pairwise distances" means the relative >>>>>> distances. So the ratios of new distance:old distance should be >>>>>> similar. The standard deviation of the ratios gives a rough&ready >>>>>> measure of the fidelity of the reduction. The standard deviation of >>>>>> simple RP should be highest, then this RP + orthogonalization, then >>>>>> MDS. >>>>>> >>>>>> On Fri, Jul 1, 2011 at 11:03 AM, Ted Dunning <[email protected]> >>>>>> wrote: >>>>>> > Lance, >>>>>> > >>>>>> > You would get better results from the random projection if you did the >>>>>> first >>>>>> > part of the stochastic SVD. Basically, you do the random projection: >>>>>> > >>>>>> > Y = A \Omega >>>>>> > >>>>>> > where A is your original data, R is the random matrix and Y is the >>>>>> result. >>>>>> > Y will be tall and skinny. >>>>>> > >>>>>> > Then, find an orthogonal basis Q of Y: >>>>>> > >>>>>> > Q R = Y >>>>>> > >>>>>> > This orthogonal basis will be very close to the orthogonal basis of A. >>>>>> In >>>>>> > fact, there are strong probabilistic guarantees on how good Q is as a >>>>>> basis >>>>>> > of A. Next, you project A using the transpose of Q: >>>>>> > >>>>>> > B = Q' A >>>>>> > >>>>>> > This gives you a short fat matrix that is the projection of A into a >>>>>> lower >>>>>> > dimensional space. Since this is a left projection, it isn't quite >>>>>> > what >>>>>> you >>>>>> > want in your work, but it is the standard way to phrase things. The >>>>>> exact >>>>>> > same thing can be done with left random projection: >>>>>> > >>>>>> > Y = \Omega A >>>>>> > L Q = Y >>>>>> > B = A Q' >>>>>> > >>>>>> > In this form, B is tall and skinny as you would like and Q' is >>>>>> essentially >>>>>> > an orthogonal reformulation of of the random projection. This >>>>>> projection is >>>>>> > about as close as you are likely to get to something that exactly >>>>>> preserves >>>>>> > distances. As such, you should be able to use MDS on B to get exactly >>>>>> the >>>>>> > same results as you want. >>>>>> > >>>>>> > Additionally, if you start with the original form and do an SVD of B >>>>>> (which >>>>>> > is fast), you will get a very good approximation of the prominent right >>>>>> > singular vectors of A. IF you do that, the first few of these should >>>>>> > be >>>>>> > about as good as MDS for visualization purposes. >>>>>> > >>>>>> > On Fri, Jul 1, 2011 at 2:44 AM, Lance Norskog <[email protected]> >>>>>> wrote: >>>>>> > >>>>>> >> I did some testing and make a lot of pretty charts: >>>>>> >> >>>>>> >> http://ultrawhizbang.blogspot.com/ >>>>>> >> >>>>>> >> If you want to get quick visualizations of your clusters, this is a >>>>>> >> great place to start. >>>>>> >> >>>>>> >> -- >>>>>> >> Lance Norskog >>>>>> >> [email protected] >>>>>> >> >>>>>> > >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Lance Norskog >>>>>> [email protected] >>>>>> >>>>> >>>>> >>>> >>> >>> >>> >>> -- >>> Lance Norskog >>> [email protected] >>> >> >> >> >> -- >> Lance Norskog >> [email protected] >> > > > > -- > Lance Norskog > [email protected] > -- Lance Norskog [email protected]
