I whipped up a MurmurHash Random and the Gaussian plot is much cleaner. The MH version is exactly as fast as the java.util.Random- j.u.R makes a new int in every cycle, and MH makes a new long, so does half as many cycles.
On Sun, Jul 3, 2011 at 12:12 AM, Ted Dunning <[email protected]> wrote: > I wasn't thinking when I typed that post. An orthonormal projection always > preserves distances since it is just a generalized reflection/rotation. > Preserving all dot products (including to self) also implies distances are > preserved because |x-y|_2 = x \dot x - 2 x \dot y + y \dot y. > > On Sat, Jul 2, 2011 at 10:15 PM, Lance Norskog <[email protected]> wrote: > >> All pairwise distances are preserved? There must be a qualifier on >> pairwise distance algorithms. >> >> On Sat, Jul 2, 2011 at 6:49 PM, Lance Norskog <[email protected]> wrote: >> > Cool. The plots are fun. The way gaussian spots project into spinning >> > chains is very educational about entropy. >> > >> > For full Random Projection, a lame random number generator >> > (java.lang.Random) will generate a higher standard deviation than a >> > high-quality one like MurmurHash. >> > >> > On Fri, Jul 1, 2011 at 5:25 PM, Ted Dunning <[email protected]> >> wrote: >> >> Here is R code that demonstrates what I mean by stunning (aka 15 >> significant >> >> figures). Note that I only check dot products here. From the fact that >> the >> >> final transform is orthonormal we know that all distances are preserved. >> >> >> >> # make a big random matrix with rank 20 >> >>> a = matrix(rnorm(20000), ncol=20) %*% matrix(rnorm(20000), nrow=20); >> >>> dim(a) >> >> [1] 1000 1000 >> >> # random projection >> >>> y = a %*% matrix(rnorm(30000), ncol=30); >> >> # get basis for y >> >>> q = qr.Q(qr(y)) >> >>> dim(q) >> >> [1] 1000 30 >> >> # re-project b, do svd on result >> >>> b = t(q) %*% a >> >>> v = svd(b)$v >> >>> d = svd(b)$d >> >> # note how singular values drop like a stone at index 21 >> >>> plot(d) >> >>> dim(v) >> >> [1] 1000 30 >> >> # finish svd just for fun >> >>> u = q %*% svd(b)$u >> >>> dim(u) >> >> [1] 1000 30 >> >> # u is orthogonal, right? >> >>> diag(t(u)%*% u) >> >> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 >> >> # and u diag(d) v' reconstructs a very precisely, right? >> >>> max(abs(a-u %*% diag(d) %*% t(v))) >> >> [1] 1.293188e-12 >> >> >> >> # now project a into the reduced dimensional space >> >>> aa = a%*%v >> >>> dim(aa) >> >> [1] 1000 30 >> >> # check a few dot products >> >>> sum(aa[1,] %*% aa[2,]) >> >> [1] 6835.152 >> >>> sum(a[1,] %*% a[2,]) >> >> [1] 6835.152 >> >>> sum(a[1,] %*% a[3,]) >> >> [1] 3337.248 >> >>> sum(aa[1,] %*% aa[3,]) >> >> [1] 3337.248 >> >> >> >> # wow, that's close. let's try a hundred dot products >> >>> dot1 = rep(0,100);dot2 = rep(0,100) >> >>> for (i in 1:100) {dot1[i] = sum(a[1,] * a[i,]); dot2[i] = sum(aa[1,]* >> >> aa[i,])} >> >> >> >> # how close to the same are those? >> >>> max(abs(dot1-dot2)) >> >> # VERY >> >> [1] 3.45608e-11 >> >> >> >> >> >> >> >> On Fri, Jul 1, 2011 at 4:54 PM, Ted Dunning <[email protected]> >> wrote: >> >> >> >>> Yes. Been there. Done that. >> >>> >> >>> The correlation is stunningly good. >> >>> >> >>> >> >>> On Fri, Jul 1, 2011 at 4:22 PM, Lance Norskog <[email protected]> >> wrote: >> >>> >> >>>> Thanks! >> >>>> >> >>>> Is this true? - "Preserving pairwise distances" means the relative >> >>>> distances. So the ratios of new distance:old distance should be >> >>>> similar. The standard deviation of the ratios gives a rough&ready >> >>>> measure of the fidelity of the reduction. The standard deviation of >> >>>> simple RP should be highest, then this RP + orthogonalization, then >> >>>> MDS. >> >>>> >> >>>> On Fri, Jul 1, 2011 at 11:03 AM, Ted Dunning <[email protected]> >> >>>> wrote: >> >>>> > Lance, >> >>>> > >> >>>> > You would get better results from the random projection if you did >> the >> >>>> first >> >>>> > part of the stochastic SVD. Basically, you do the random >> projection: >> >>>> > >> >>>> > Y = A \Omega >> >>>> > >> >>>> > where A is your original data, R is the random matrix and Y is the >> >>>> result. >> >>>> > Y will be tall and skinny. >> >>>> > >> >>>> > Then, find an orthogonal basis Q of Y: >> >>>> > >> >>>> > Q R = Y >> >>>> > >> >>>> > This orthogonal basis will be very close to the orthogonal basis of >> A. >> >>>> In >> >>>> > fact, there are strong probabilistic guarantees on how good Q is as >> a >> >>>> basis >> >>>> > of A. Next, you project A using the transpose of Q: >> >>>> > >> >>>> > B = Q' A >> >>>> > >> >>>> > This gives you a short fat matrix that is the projection of A into a >> >>>> lower >> >>>> > dimensional space. Since this is a left projection, it isn't quite >> what >> >>>> you >> >>>> > want in your work, but it is the standard way to phrase things. The >> >>>> exact >> >>>> > same thing can be done with left random projection: >> >>>> > >> >>>> > Y = \Omega A >> >>>> > L Q = Y >> >>>> > B = A Q' >> >>>> > >> >>>> > In this form, B is tall and skinny as you would like and Q' is >> >>>> essentially >> >>>> > an orthogonal reformulation of of the random projection. This >> >>>> projection is >> >>>> > about as close as you are likely to get to something that exactly >> >>>> preserves >> >>>> > distances. As such, you should be able to use MDS on B to get >> exactly >> >>>> the >> >>>> > same results as you want. >> >>>> > >> >>>> > Additionally, if you start with the original form and do an SVD of B >> >>>> (which >> >>>> > is fast), you will get a very good approximation of the prominent >> right >> >>>> > singular vectors of A. IF you do that, the first few of these >> should be >> >>>> > about as good as MDS for visualization purposes. >> >>>> > >> >>>> > On Fri, Jul 1, 2011 at 2:44 AM, Lance Norskog <[email protected]> >> >>>> wrote: >> >>>> > >> >>>> >> I did some testing and make a lot of pretty charts: >> >>>> >> >> >>>> >> http://ultrawhizbang.blogspot.com/ >> >>>> >> >> >>>> >> If you want to get quick visualizations of your clusters, this is a >> >>>> >> great place to start. >> >>>> >> >> >>>> >> -- >> >>>> >> Lance Norskog >> >>>> >> [email protected] >> >>>> >> >> >>>> > >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Lance Norskog >> >>>> [email protected] >> >>>> >> >>> >> >>> >> >> >> > >> > >> > >> > -- >> > Lance Norskog >> > [email protected] >> > >> >> >> >> -- >> Lance Norskog >> [email protected] >> > -- Lance Norskog [email protected]
