I wasn't thinking when I typed that post. An orthonormal projection always preserves distances since it is just a generalized reflection/rotation. Preserving all dot products (including to self) also implies distances are preserved because |x-y|_2 = x \dot x - 2 x \dot y + y \dot y.
On Sat, Jul 2, 2011 at 10:15 PM, Lance Norskog <[email protected]> wrote: > All pairwise distances are preserved? There must be a qualifier on > pairwise distance algorithms. > > On Sat, Jul 2, 2011 at 6:49 PM, Lance Norskog <[email protected]> wrote: > > Cool. The plots are fun. The way gaussian spots project into spinning > > chains is very educational about entropy. > > > > For full Random Projection, a lame random number generator > > (java.lang.Random) will generate a higher standard deviation than a > > high-quality one like MurmurHash. > > > > On Fri, Jul 1, 2011 at 5:25 PM, Ted Dunning <[email protected]> > wrote: > >> Here is R code that demonstrates what I mean by stunning (aka 15 > significant > >> figures). Note that I only check dot products here. From the fact that > the > >> final transform is orthonormal we know that all distances are preserved. > >> > >> # make a big random matrix with rank 20 > >>> a = matrix(rnorm(20000), ncol=20) %*% matrix(rnorm(20000), nrow=20); > >>> dim(a) > >> [1] 1000 1000 > >> # random projection > >>> y = a %*% matrix(rnorm(30000), ncol=30); > >> # get basis for y > >>> q = qr.Q(qr(y)) > >>> dim(q) > >> [1] 1000 30 > >> # re-project b, do svd on result > >>> b = t(q) %*% a > >>> v = svd(b)$v > >>> d = svd(b)$d > >> # note how singular values drop like a stone at index 21 > >>> plot(d) > >>> dim(v) > >> [1] 1000 30 > >> # finish svd just for fun > >>> u = q %*% svd(b)$u > >>> dim(u) > >> [1] 1000 30 > >> # u is orthogonal, right? > >>> diag(t(u)%*% u) > >> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > >> # and u diag(d) v' reconstructs a very precisely, right? > >>> max(abs(a-u %*% diag(d) %*% t(v))) > >> [1] 1.293188e-12 > >> > >> # now project a into the reduced dimensional space > >>> aa = a%*%v > >>> dim(aa) > >> [1] 1000 30 > >> # check a few dot products > >>> sum(aa[1,] %*% aa[2,]) > >> [1] 6835.152 > >>> sum(a[1,] %*% a[2,]) > >> [1] 6835.152 > >>> sum(a[1,] %*% a[3,]) > >> [1] 3337.248 > >>> sum(aa[1,] %*% aa[3,]) > >> [1] 3337.248 > >> > >> # wow, that's close. let's try a hundred dot products > >>> dot1 = rep(0,100);dot2 = rep(0,100) > >>> for (i in 1:100) {dot1[i] = sum(a[1,] * a[i,]); dot2[i] = sum(aa[1,]* > >> aa[i,])} > >> > >> # how close to the same are those? > >>> max(abs(dot1-dot2)) > >> # VERY > >> [1] 3.45608e-11 > >> > >> > >> > >> On Fri, Jul 1, 2011 at 4:54 PM, Ted Dunning <[email protected]> > wrote: > >> > >>> Yes. Been there. Done that. > >>> > >>> The correlation is stunningly good. > >>> > >>> > >>> On Fri, Jul 1, 2011 at 4:22 PM, Lance Norskog <[email protected]> > wrote: > >>> > >>>> Thanks! > >>>> > >>>> Is this true? - "Preserving pairwise distances" means the relative > >>>> distances. So the ratios of new distance:old distance should be > >>>> similar. The standard deviation of the ratios gives a rough&ready > >>>> measure of the fidelity of the reduction. The standard deviation of > >>>> simple RP should be highest, then this RP + orthogonalization, then > >>>> MDS. > >>>> > >>>> On Fri, Jul 1, 2011 at 11:03 AM, Ted Dunning <[email protected]> > >>>> wrote: > >>>> > Lance, > >>>> > > >>>> > You would get better results from the random projection if you did > the > >>>> first > >>>> > part of the stochastic SVD. Basically, you do the random > projection: > >>>> > > >>>> > Y = A \Omega > >>>> > > >>>> > where A is your original data, R is the random matrix and Y is the > >>>> result. > >>>> > Y will be tall and skinny. > >>>> > > >>>> > Then, find an orthogonal basis Q of Y: > >>>> > > >>>> > Q R = Y > >>>> > > >>>> > This orthogonal basis will be very close to the orthogonal basis of > A. > >>>> In > >>>> > fact, there are strong probabilistic guarantees on how good Q is as > a > >>>> basis > >>>> > of A. Next, you project A using the transpose of Q: > >>>> > > >>>> > B = Q' A > >>>> > > >>>> > This gives you a short fat matrix that is the projection of A into a > >>>> lower > >>>> > dimensional space. Since this is a left projection, it isn't quite > what > >>>> you > >>>> > want in your work, but it is the standard way to phrase things. The > >>>> exact > >>>> > same thing can be done with left random projection: > >>>> > > >>>> > Y = \Omega A > >>>> > L Q = Y > >>>> > B = A Q' > >>>> > > >>>> > In this form, B is tall and skinny as you would like and Q' is > >>>> essentially > >>>> > an orthogonal reformulation of of the random projection. This > >>>> projection is > >>>> > about as close as you are likely to get to something that exactly > >>>> preserves > >>>> > distances. As such, you should be able to use MDS on B to get > exactly > >>>> the > >>>> > same results as you want. > >>>> > > >>>> > Additionally, if you start with the original form and do an SVD of B > >>>> (which > >>>> > is fast), you will get a very good approximation of the prominent > right > >>>> > singular vectors of A. IF you do that, the first few of these > should be > >>>> > about as good as MDS for visualization purposes. > >>>> > > >>>> > On Fri, Jul 1, 2011 at 2:44 AM, Lance Norskog <[email protected]> > >>>> wrote: > >>>> > > >>>> >> I did some testing and make a lot of pretty charts: > >>>> >> > >>>> >> http://ultrawhizbang.blogspot.com/ > >>>> >> > >>>> >> If you want to get quick visualizations of your clusters, this is a > >>>> >> great place to start. > >>>> >> > >>>> >> -- > >>>> >> Lance Norskog > >>>> >> [email protected] > >>>> >> > >>>> > > >>>> > >>>> > >>>> > >>>> -- > >>>> Lance Norskog > >>>> [email protected] > >>>> > >>> > >>> > >> > > > > > > > > -- > > Lance Norskog > > [email protected] > > > > > > -- > Lance Norskog > [email protected] >
