Re: Dimensional Reduction via Random Projection: investigations

Lance Norskog Fri, 08 Jul 2011 01:54:39 -0700

Thanks! Very illuminating.



On Thu, Jul 7, 2011 at 10:15 PM, Ted Dunning <[email protected]> wrote:
> This means that the rank 2 reconstruction of your matrix is close to your
> original in the sense that the Frobenius norm of the difference will be
> small.
>
> In particular, the Frobenius norm of the delta should be the same as the
> Frobenius norm of the missing singular values (root sum squared missing
> values, that is).
>
> Here is an example.  First I use a random 20x7 matrix to get an SVD into
> which I transplant your singular values.  This gives me a matrix whose
> decomposition is the same as the one you are using.
>
> I then do that decomposition and truncate the singular values to get an
> approximate matrix.  The Frobenius norm of the difference is the same as the
> Frobenius norm of the missing singular values.
>
>> m = matrix(rnorm(20*7), nrow=20)
>> svd1 = svd(m)
>> length(svd1$d)
> [1] 7
>> d = c(0.7, 0.2,0.05, 0.02, 0.01, 0.01, 0.01)
>> m2 = svd1$u %*% diag(d) %*% t(svd1$v)
>> svd = svd(m2)
>> svd$d
> [1] 0.70 0.20 0.05 0.02 0.01 0.01 0.01
>> m3 = svd$u[,1:2] %*% diag(svd$d[1:2]) %*% t(svd$v[,1:2])
>> dim(m3)
> [1] 20  7
>> m2-m3
>               [,1]          [,2]          [,3]          [,4]          [,5]
>         [,6]          [,7]
>  [1,]  0.0069233794  0.0020467352 -0.0071659763 -4.099546e-03  0.0056399256
> -0.0023953930 -0.0119370905
>  [2,] -0.0018546491  0.0011631030  0.0013261685 -1.193252e-03  0.0002839689
>  0.0014320601  0.0036207164
>  [3,]  0.0011612177  0.0027845827 -0.0023368373 -4.240565e-03  0.0009362635
> -0.0032987483 -0.0019110953
>  [4,] -0.0061414783  0.0070092709  0.0066429461  2.240401e-03 -0.0003033182
> -0.0031444510  0.0027860257
>  [5,]  0.0004910556 -0.0057660226  0.0014586550 -3.383020e-03 -0.0015763103
>  0.0011357677  0.0101147998
>  [6,]  0.0016672016 -0.0043701670 -0.0002311687 -1.706181e-04 -0.0032324629
> -0.0033587690  0.0018471306
>  [7,] -0.0024146270  0.0007510238  0.0052282604  7.724380e-04 -0.0004411600
> -0.0026622302  0.0050655693
>  [8,]  0.0036106469  0.0028629467 -0.0007957853  1.333764e-03  0.0074933368
>  0.0008158132 -0.0091284389
>  [9,]  0.0013369776  0.0036364763 -0.0009691292 -2.050044e-03  0.0021208815
> -0.0042241753 -0.0043885229
> [10,]  0.0031153692  0.0003852343 -0.0053822410 -6.538480e-04  0.0005221515
> -0.0003594550 -0.0077290438
> [11,] -0.0012286952  0.0026373981  0.0017958449  4.693112e-05  0.0003753286
> -0.0024000476 -0.0001261246
> [12,] -0.0024890888 -0.0018374670  0.0048781861 -1.065282e-03 -0.0018902396
> -0.0013280442  0.0096305420
> [13,]  0.0099545328 -0.0012843802 -0.0035220130  1.599559e-03  0.0081592109
> -0.0047310711 -0.0158840779
> [14,] -0.0029835169  0.0046807105  0.0016607724  4.339315e-03 -0.0015926183
> -0.0026172305 -0.0048268815
> [15,] -0.0102632616  0.0033271770  0.0104700407  2.728651e-03 -0.0037697307
>  0.0016053567  0.0145899365
> [16,] -0.0074888872 -0.0002277379  0.0068370652 -4.688963e-05 -0.0044921343
>  0.0024889009  0.0150436991
> [17,] -0.0068501952 -0.0017733601  0.0076497285  1.743932e-03 -0.0055472565
>  0.0006109667  0.0142443162
> [18,] -0.0020245716 -0.0011431425  0.0044837803  3.219527e-04  0.0007887701
>  0.0019836205  0.0070585228
> [19,] -0.0016059867 -0.0028328316  0.0032223649  2.025061e-03 -0.0019194344
>  0.0009643023  0.0052452638
> [20,]  0.0042324909 -0.0063013966 -0.0041269199 -9.848214e-04 -0.0029591571
> -0.0015911580 -0.0012584919
>> sqrt(sum((m2-m3)^2))
> [1] 0.05656854
>> sqrt(sum(d[3:7]^2))
> [1] 0.05656854
>>
>
>
>
>
>
>
>
> On Thu, Jul 7, 2011 at 8:46 PM, Lance Norskog <[email protected]> wrote:
>
>> Rats "My 3D coordinates" should be 'My 2D coordinates'. The there is a
>> preposition missing in the first sentence.
>>
>> On Thu, Jul 7, 2011 at 8:44 PM, Lance Norskog <[email protected]> wrote:
>> > The singular values in my experiments drop like a rock. What is
>> > information/probability loss formula when dropping low-value vectors?
>> >
>> > That is, I start with a 7D vector set, go through this random
>> > projection/svd exercise, and get these singular vectors: [0.7, 0.2,
>> > 0.05, 0.02, 0.01, 0.01, 0.01]. I drop the last five to create a matrix
>> > that gives 2D coordinates. The sum of the remaining singular values is
>> > 0.9. My 3D vectors will be missing 0.10 of *something* compared to the
>> > original 7D vectors. What is this something and what other concepts
>> > does it plug into?
>> >
>> > Lance
>> >
>> > On Sat, Jul 2, 2011 at 11:54 PM, Lance Norskog <[email protected]>
>> wrote:
>> >> The singular values on my recommender vectors come out: 90, 10, 1.2,
>> >> 1.1, 1.0, 0.95..... This was playing with your R code. Based on this,
>> >> I'm adding the QR stuff to my visualization toolkit.
>> >>
>> >> Lance
>> >>
>> >> On Sat, Jul 2, 2011 at 10:15 PM, Lance Norskog <[email protected]>
>> wrote:
>> >>> All pairwise distances are preserved? There must be a qualifier on
>> >>> pairwise distance algorithms.
>> >>>
>> >>> On Sat, Jul 2, 2011 at 6:49 PM, Lance Norskog <[email protected]>
>> wrote:
>> >>>> Cool. The plots are fun. The way gaussian spots project into spinning
>> >>>> chains is very educational about entropy.
>> >>>>
>> >>>> For full Random Projection, a lame random number generator
>> >>>> (java.lang.Random) will generate a higher standard deviation than a
>> >>>> high-quality one like MurmurHash.
>> >>>>
>> >>>> On Fri, Jul 1, 2011 at 5:25 PM, Ted Dunning <[email protected]>
>> wrote:
>> >>>>> Here is R code that demonstrates what I mean by stunning (aka 15
>> significant
>> >>>>> figures).  Note that I only check dot products here.  From the fact
>> that the
>> >>>>> final transform is orthonormal we know that all distances are
>> preserved.
>> >>>>>
>> >>>>> # make a big random matrix with rank 20
>> >>>>>> a = matrix(rnorm(20000), ncol=20) %*% matrix(rnorm(20000), nrow=20);
>> >>>>>> dim(a)
>> >>>>> [1] 1000 1000
>> >>>>> # random projection
>> >>>>>> y = a %*% matrix(rnorm(30000), ncol=30);
>> >>>>> # get basis for y
>> >>>>>> q = qr.Q(qr(y))
>> >>>>>> dim(q)
>> >>>>> [1] 1000   30
>> >>>>> # re-project b, do svd on result
>> >>>>>> b = t(q) %*% a
>> >>>>>> v = svd(b)$v
>> >>>>>> d = svd(b)$d
>> >>>>> # note how singular values drop like a stone at index 21
>> >>>>>> plot(d)
>> >>>>>> dim(v)
>> >>>>> [1] 1000   30
>> >>>>> # finish svd just for fun
>> >>>>>> u = q %*% svd(b)$u
>> >>>>>> dim(u)
>> >>>>> [1] 1000   30
>> >>>>> # u is orthogonal, right?
>> >>>>>> diag(t(u)%*% u)
>> >>>>>  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>> >>>>> # and u diag(d) v' reconstructs a very precisely, right?
>> >>>>>> max(abs(a-u %*% diag(d) %*% t(v)))
>> >>>>> [1] 1.293188e-12
>> >>>>>
>> >>>>> # now project a into the reduced dimensional space
>> >>>>>> aa = a%*%v
>> >>>>>> dim(aa)
>> >>>>> [1] 1000   30
>> >>>>> # check a few dot products
>> >>>>>> sum(aa[1,] %*% aa[2,])
>> >>>>> [1] 6835.152
>> >>>>>> sum(a[1,] %*% a[2,])
>> >>>>> [1] 6835.152
>> >>>>>> sum(a[1,] %*% a[3,])
>> >>>>> [1] 3337.248
>> >>>>>> sum(aa[1,] %*% aa[3,])
>> >>>>> [1] 3337.248
>> >>>>>
>> >>>>> # wow, that's close.  let's try a hundred dot products
>> >>>>>> dot1 = rep(0,100);dot2 = rep(0,100)
>> >>>>>> for (i in 1:100) {dot1[i] = sum(a[1,] * a[i,]); dot2[i] =
>> sum(aa[1,]*
>> >>>>> aa[i,])}
>> >>>>>
>> >>>>> # how close to the same are those?
>> >>>>>> max(abs(dot1-dot2))
>> >>>>> # VERY
>> >>>>> [1] 3.45608e-11
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Fri, Jul 1, 2011 at 4:54 PM, Ted Dunning <[email protected]>
>> wrote:
>> >>>>>
>> >>>>>> Yes.  Been there.  Done that.
>> >>>>>>
>> >>>>>> The correlation is stunningly good.
>> >>>>>>
>> >>>>>>
>> >>>>>> On Fri, Jul 1, 2011 at 4:22 PM, Lance Norskog <[email protected]>
>> wrote:
>> >>>>>>
>> >>>>>>> Thanks!
>> >>>>>>>
>> >>>>>>> Is this true? - "Preserving pairwise distances" means the relative
>> >>>>>>> distances. So the ratios of new distance:old distance should be
>> >>>>>>> similar. The standard deviation of the ratios gives a rough&ready
>> >>>>>>> measure of the fidelity of the reduction. The standard deviation of
>> >>>>>>> simple RP should be highest, then this RP + orthogonalization, then
>> >>>>>>> MDS.
>> >>>>>>>
>> >>>>>>> On Fri, Jul 1, 2011 at 11:03 AM, Ted Dunning <
>> [email protected]>
>> >>>>>>> wrote:
>> >>>>>>> > Lance,
>> >>>>>>> >
>> >>>>>>> > You would get better results from the random projection if you
>> did the
>> >>>>>>> first
>> >>>>>>> > part of the stochastic SVD.  Basically, you do the random
>> projection:
>> >>>>>>> >
>> >>>>>>> >       Y = A \Omega
>> >>>>>>> >
>> >>>>>>> > where A is your original data, R is the random matrix and Y is
>> the
>> >>>>>>> result.
>> >>>>>>> >  Y will be tall and skinny.
>> >>>>>>> >
>> >>>>>>> > Then, find an orthogonal basis Q of Y:
>> >>>>>>> >
>> >>>>>>> >       Q R = Y
>> >>>>>>> >
>> >>>>>>> > This orthogonal basis will be very close to the orthogonal basis
>> of A.
>> >>>>>>>  In
>> >>>>>>> > fact, there are strong probabilistic guarantees on how good Q is
>> as a
>> >>>>>>> basis
>> >>>>>>> > of A.  Next, you project A using the transpose of Q:
>> >>>>>>> >
>> >>>>>>> >       B = Q' A
>> >>>>>>> >
>> >>>>>>> > This gives you a short fat matrix that is the projection of A
>> into a
>> >>>>>>> lower
>> >>>>>>> > dimensional space.  Since this is a left projection, it isn't
>> quite what
>> >>>>>>> you
>> >>>>>>> > want in your work, but it is the standard way to phrase things.
>>  The
>> >>>>>>> exact
>> >>>>>>> > same thing can be done with left random projection:
>> >>>>>>> >
>> >>>>>>> >      Y = \Omega A
>> >>>>>>> >      L Q = Y
>> >>>>>>> >      B = A Q'
>> >>>>>>> >
>> >>>>>>> > In this form, B is tall and skinny as you would like and Q' is
>> >>>>>>> essentially
>> >>>>>>> > an orthogonal reformulation of of the random projection.  This
>> >>>>>>> projection is
>> >>>>>>> > about as close as you are likely to get to something that exactly
>> >>>>>>> preserves
>> >>>>>>> > distances.  As such, you should be able to use MDS on B to get
>> exactly
>> >>>>>>> the
>> >>>>>>> > same results as you want.
>> >>>>>>> >
>> >>>>>>> > Additionally, if you start with the original form and do an SVD
>> of B
>> >>>>>>> (which
>> >>>>>>> > is fast), you will get a very good approximation of the prominent
>> right
>> >>>>>>> > singular vectors of A.  IF you do that, the first few of these
>> should be
>> >>>>>>> > about as good as MDS for visualization purposes.
>> >>>>>>> >
>> >>>>>>> > On Fri, Jul 1, 2011 at 2:44 AM, Lance Norskog <[email protected]
>> >
>> >>>>>>> wrote:
>> >>>>>>> >
>> >>>>>>> >> I did some testing and make a lot of pretty charts:
>> >>>>>>> >>
>> >>>>>>> >> http://ultrawhizbang.blogspot.com/
>> >>>>>>> >>
>> >>>>>>> >> If you want to get quick visualizations of your clusters, this
>> is a
>> >>>>>>> >> great place to start.
>> >>>>>>> >>
>> >>>>>>> >> --
>> >>>>>>> >> Lance Norskog
>> >>>>>>> >> [email protected]
>> >>>>>>> >>
>> >>>>>>> >
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> Lance Norskog
>> >>>>>>> [email protected]
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Lance Norskog
>> >>>> [email protected]
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Lance Norskog
>> >>> [email protected]
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Lance Norskog
>> >> [email protected]
>> >>
>> >
>> >
>> >
>> > --
>> > Lance Norskog
>> > [email protected]
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> [email protected]
>>
>



-- 
Lance Norskog
[email protected]

Re: Dimensional Reduction via Random Projection: investigations

Reply via email to