Re: Dimensional Reduction via Random Projection: investigations

Lance Norskog Sat, 16 Jul 2011 18:36:15 -0700

Ok, got a little farther. Now, what is useful about projecting with
the factorization? My outputs looked maybe a little more regular. One
crazy thing turned up: the vector set has a clump and 3 far outlier
points. The fact that these are outliers did not turn up in other
visualizations. But when projected with the QR.q, the 3 outliers
jumped far away from the clump.


What exactly is going on here? Why does rotating with QR.q create very
segregated clumps?

On Sun, Jul 10, 2011 at 8:02 PM, Lance Norskog <[email protected]> wrote:
> I've tried the QR factorization trick describe above, and posted the
> new pictures:
> http://ultrawhizbang.blogspot.com/2011/07/dimensional-reduction-via-random_10.html
>
> I really need to see movie names to get farther with this.
>
> On 7/9/11, Ted Dunning <[email protected]> wrote:
>> The problem is still not precisely stated. You have 95% confidence in what?
>> Do you know that 95% of
>> The entries are exact and the otherwise be arbitrarily corrupted?  If so,
>> you can say nothing because the error is unbounded in terms of squared
>> error.
>>
>> In order to say something stronger you have to phrase your error in terms of
>> something that has invariant properties. This is why invariants are such
>> important concepts in such a variety of quantitative endeavors.
>>
>> Sent from my iPhone
>>
>> On Jul 9, 2011, at 15:51, Lance Norskog <[email protected]> wrote:
>>
>>> Yes, I misphrased it. Let's say I have a dataset in which I have 95%
>>> confidence of its correctness, and I multiply it by a matrix which
>>> does not lose information, I will have 95% confidence in the output
>>> data.
>>>
>>> Now, let's downsize the matrix by the above numbers. The remaining
>>> matrix contains the singular vectors up to 0.9, with 0.1 percent of
>>> the singular vectors missing. What is my confidence in the output
>>> data?
>>>
>>> On Sat, Jul 9, 2011 at 11:46 AM, Ted Dunning <[email protected]>
>>> wrote:
>>>> I don't understand the question.
>>>>
>>>> A rotation leaves the Frobenius norm unchanged.  Period.  Any
>>>> rank-limited
>>>> optimal least-squares approximation in one rotated/translated frame is
>>>> optimal in any other such frame.  What do you mean by 99% confidence
>>>> here?
>>>>  The approximation is optimal or not.
>>>>
>>>> On Fri, Jul 8, 2011 at 7:49 PM, Lance Norskog <[email protected]> wrote:
>>>>
>>>>> Getting closer: if I have a matrix that does a particular
>>>>> rotation/translation etc. with a 99% confidence interval, and I do the
>>>>> above trimming operation, is it possible to translate this into a new
>>>>> confidence level? Are there specific ways to translate these numbers
>>>>> into probabilistic estimates? Is it just way too hairy?
>>>>>
>>>>> Lance
>>>>>
>>>>> On Thu, Jul 7, 2011 at 10:15 PM, Ted Dunning <[email protected]>
>>>>> wrote:
>>>>>> This means that the rank 2 reconstruction of your matrix is close to
>>>>>> your
>>>>>> original in the sense that the Frobenius norm of the difference will be
>>>>>> small.
>>>>>>
>>>>>> In particular, the Frobenius norm of the delta should be the same as
>>>>>> the
>>>>>> Frobenius norm of the missing singular values (root sum squared missing
>>>>>> values, that is).
>>>>>>
>>>>>> Here is an example.  First I use a random 20x7 matrix to get an SVD
>>>>>> into
>>>>>> which I transplant your singular values.  This gives me a matrix whose
>>>>>> decomposition is the same as the one you are using.
>>>>>>
>>>>>> I then do that decomposition and truncate the singular values to get an
>>>>>> approximate matrix.  The Frobenius norm of the difference is the same
>>>>>> as
>>>>> the
>>>>>> Frobenius norm of the missing singular values.
>>>>>>
>>>>>>> m = matrix(rnorm(20*7), nrow=20)
>>>>>>> svd1 = svd(m)
>>>>>>> length(svd1$d)
>>>>>> [1] 7
>>>>>>> d = c(0.7, 0.2,0.05, 0.02, 0.01, 0.01, 0.01)
>>>>>>> m2 = svd1$u %*% diag(d) %*% t(svd1$v)
>>>>>>> svd = svd(m2)
>>>>>>> svd$d
>>>>>> [1] 0.70 0.20 0.05 0.02 0.01 0.01 0.01
>>>>>>> m3 = svd$u[,1:2] %*% diag(svd$d[1:2]) %*% t(svd$v[,1:2])
>>>>>>> dim(m3)
>>>>>> [1] 20  7
>>>>>>> m2-m3
>>>>>>               [,1]          [,2]          [,3]          [,4]
>>>>>  [,5]
>>>>>>         [,6]          [,7]
>>>>>>  [1,]  0.0069233794  0.0020467352 -0.0071659763 -4.099546e-03
>>>>>  0.0056399256
>>>>>> -0.0023953930 -0.0119370905
>>>>>>  [2,] -0.0018546491  0.0011631030  0.0013261685 -1.193252e-03
>>>>>  0.0002839689
>>>>>>  0.0014320601  0.0036207164
>>>>>>  [3,]  0.0011612177  0.0027845827 -0.0023368373 -4.240565e-03
>>>>>  0.0009362635
>>>>>> -0.0032987483 -0.0019110953
>>>>>>  [4,] -0.0061414783  0.0070092709  0.0066429461  2.240401e-03
>>>>> -0.0003033182
>>>>>> -0.0031444510  0.0027860257
>>>>>>  [5,]  0.0004910556 -0.0057660226  0.0014586550 -3.383020e-03
>>>>> -0.0015763103
>>>>>>  0.0011357677  0.0101147998
>>>>>>  [6,]  0.0016672016 -0.0043701670 -0.0002311687 -1.706181e-04
>>>>> -0.0032324629
>>>>>> -0.0033587690  0.0018471306
>>>>>>  [7,] -0.0024146270  0.0007510238  0.0052282604  7.724380e-04
>>>>> -0.0004411600
>>>>>> -0.0026622302  0.0050655693
>>>>>>  [8,]  0.0036106469  0.0028629467 -0.0007957853  1.333764e-03
>>>>>  0.0074933368
>>>>>>  0.0008158132 -0.0091284389
>>>>>>  [9,]  0.0013369776  0.0036364763 -0.0009691292 -2.050044e-03
>>>>>  0.0021208815
>>>>>> -0.0042241753 -0.0043885229
>>>>>> [10,]  0.0031153692  0.0003852343 -0.0053822410 -6.538480e-04
>>>>>  0.0005221515
>>>>>> -0.0003594550 -0.0077290438
>>>>>> [11,] -0.0012286952  0.0026373981  0.0017958449  4.693112e-05
>>>>>  0.0003753286
>>>>>> -0.0024000476 -0.0001261246
>>>>>> [12,] -0.0024890888 -0.0018374670  0.0048781861 -1.065282e-03
>>>>> -0.0018902396
>>>>>> -0.0013280442  0.0096305420
>>>>>> [13,]  0.0099545328 -0.0012843802 -0.0035220130  1.599559e-03
>>>>>  0.0081592109
>>>>>> -0.0047310711 -0.0158840779
>>>>>> [14,] -0.0029835169  0.0046807105  0.0016607724  4.339315e-03
>>>>> -0.0015926183
>>>>>> -0.0026172305 -0.0048268815
>>>>>> [15,] -0.0102632616  0.0033271770  0.0104700407  2.728651e-03
>>>>> -0.0037697307
>>>>>>  0.0016053567  0.0145899365
>>>>>> [16,] -0.0074888872 -0.0002277379  0.0068370652 -4.688963e-05
>>>>> -0.0044921343
>>>>>>  0.0024889009  0.0150436991
>>>>>> [17,] -0.0068501952 -0.0017733601  0.0076497285  1.743932e-03
>>>>> -0.0055472565
>>>>>>  0.0006109667  0.0142443162
>>>>>> [18,] -0.0020245716 -0.0011431425  0.0044837803  3.219527e-04
>>>>>  0.0007887701
>>>>>>  0.0019836205  0.0070585228
>>>>>> [19,] -0.0016059867 -0.0028328316  0.0032223649  2.025061e-03
>>>>> -0.0019194344
>>>>>>  0.0009643023  0.0052452638
>>>>>> [20,]  0.0042324909 -0.0063013966 -0.0041269199 -9.848214e-04
>>>>> -0.0029591571
>>>>>> -0.0015911580 -0.0012584919
>>>>>>> sqrt(sum((m2-m3)^2))
>>>>>> [1] 0.05656854
>>>>>>> sqrt(sum(d[3:7]^2))
>>>>>> [1] 0.05656854
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 7, 2011 at 8:46 PM, Lance Norskog <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Rats "My 3D coordinates" should be 'My 2D coordinates'. The there is a
>>>>>>> preposition missing in the first sentence.
>>>>>>>
>>>>>>> On Thu, Jul 7, 2011 at 8:44 PM, Lance Norskog <[email protected]>
>>>>> wrote:
>>>>>>>> The singular values in my experiments drop like a rock. What is
>>>>>>>> information/probability loss formula when dropping low-value vectors?
>>>>>>>>
>>>>>>>> That is, I start with a 7D vector set, go through this random
>>>>>>>> projection/svd exercise, and get these singular vectors: [0.7, 0.2,
>>>>>>>> 0.05, 0.02, 0.01, 0.01, 0.01]. I drop the last five to create a
>>>>>>>> matrix
>>>>>>>> that gives 2D coordinates. The sum of the remaining singular values
>>>>>>>> is
>>>>>>>> 0.9. My 3D vectors will be missing 0.10 of *something* compared to
>>>>>>>> the
>>>>>>>> original 7D vectors. What is this something and what other concepts
>>>>>>>> does it plug into?
>>>>>>>>
>>>>>>>> Lance
>>>>>>>>
>>>>>>>> On Sat, Jul 2, 2011 at 11:54 PM, Lance Norskog <[email protected]>
>>>>>>> wrote:
>>>>>>>>> The singular values on my recommender vectors come out: 90, 10, 1.2,
>>>>>>>>> 1.1, 1.0, 0.95..... This was playing with your R code. Based on
>>>>>>>>> this,
>>>>>>>>> I'm adding the QR stuff to my visualization toolkit.
>>>>>>>>>
>>>>>>>>> Lance
>>>>>>>>>
>>>>>>>>> On Sat, Jul 2, 2011 at 10:15 PM, Lance Norskog <[email protected]>
>>>>>>> wrote:
>>>>>>>>>> All pairwise distances are preserved? There must be a qualifier on
>>>>>>>>>> pairwise distance algorithms.
>>>>>>>>>>
>>>>>>>>>> On Sat, Jul 2, 2011 at 6:49 PM, Lance Norskog <[email protected]>
>>>>>>> wrote:
>>>>>>>>>>> Cool. The plots are fun. The way gaussian spots project into
>>>>> spinning
>>>>>>>>>>> chains is very educational about entropy.
>>>>>>>>>>>
>>>>>>>>>>> For full Random Projection, a lame random number generator
>>>>>>>>>>> (java.lang.Random) will generate a higher standard deviation than
>>>>>>>>>>> a
>>>>>>>>>>> high-quality one like MurmurHash.
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 1, 2011 at 5:25 PM, Ted Dunning <[email protected]
>>>>>>
>>>>>>> wrote:
>>>>>>>>>>>> Here is R code that demonstrates what I mean by stunning (aka 15
>>>>>>> significant
>>>>>>>>>>>> figures).  Note that I only check dot products here.  From the
>>>>> fact
>>>>>>> that the
>>>>>>>>>>>> final transform is orthonormal we know that all distances are
>>>>>>> preserved.
>>>>>>>>>>>>
>>>>>>>>>>>> # make a big random matrix with rank 20
>>>>>>>>>>>>> a = matrix(rnorm(20000), ncol=20) %*% matrix(rnorm(20000),
>>>>> nrow=20);
>>>>>>>>>>>>> dim(a)
>>>>>>>>>>>> [1] 1000 1000
>>>>>>>>>>>> # random projection
>>>>>>>>>>>>> y = a %*% matrix(rnorm(30000), ncol=30);
>>>>>>>>>>>> # get basis for y
>>>>>>>>>>>>> q = qr.Q(qr(y))
>>>>>>>>>>>>> dim(q)
>>>>>>>>>>>> [1] 1000   30
>>>>>>>>>>>> # re-project b, do svd on result
>>>>>>>>>>>>> b = t(q) %*% a
>>>>>>>>>>>>> v = svd(b)$v
>>>>>>>>>>>>> d = svd(b)$d
>>>>>>>>>>>> # note how singular values drop like a stone at index 21
>>>>>>>>>>>>> plot(d)
>>>>>>>>>>>>> dim(v)
>>>>>>>>>>>> [1] 1000   30
>>>>>>>>>>>> # finish svd just for fun
>>>>>>>>>>>>> u = q %*% svd(b)$u
>>>>>>>>>>>>> dim(u)
>>>>>>>>>>>> [1] 1000   30
>>>>>>>>>>>> # u is orthogonal, right?
>>>>>>>>>>>>> diag(t(u)%*% u)
>>>>>>>>>>>>  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>>>>>>>>>>> # and u diag(d) v' reconstructs a very precisely, right?
>>>>>>>>>>>>> max(abs(a-u %*% diag(d) %*% t(v)))
>>>>>>>>>>>> [1] 1.293188e-12
>>>>>>>>>>>>
>>>>>>>>>>>> # now project a into the reduced dimensional space
>>>>>>>>>>>>> aa = a%*%v
>>>>>>>>>>>>> dim(aa)
>>>>>>>>>>>> [1] 1000   30
>>>>>>>>>>>> # check a few dot products
>>>>>>>>>>>>> sum(aa[1,] %*% aa[2,])
>>>>>>>>>>>> [1] 6835.152
>>>>>>>>>>>>> sum(a[1,] %*% a[2,])
>>>>>>>>>>>> [1] 6835.152
>>>>>>>>>>>>> sum(a[1,] %*% a[3,])
>>>>>>>>>>>> [1] 3337.248
>>>>>>>>>>>>> sum(aa[1,] %*% aa[3,])
>>>>>>>>>>>> [1] 3337.248
>>>>>>>>>>>>
>>>>>>>>>>>> # wow, that's close.  let's try a hundred dot products
>>>>>>>>>>>>> dot1 = rep(0,100);dot2 = rep(0,100)
>>>>>>>>>>>>> for (i in 1:100) {dot1[i] = sum(a[1,] * a[i,]); dot2[i] =
>>>>>>> sum(aa[1,]*
>>>>>>>>>>>> aa[i,])}
>>>>>>>>>>>>
>>>>>>>>>>>> # how close to the same are those?
>>>>>>>>>>>>> max(abs(dot1-dot2))
>>>>>>>>>>>> # VERY
>>>>>>>>>>>> [1] 3.45608e-11
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jul 1, 2011 at 4:54 PM, Ted Dunning <
>>>>> [email protected]>
>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Yes.  Been there.  Done that.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The correlation is stunningly good.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jul 1, 2011 at 4:22 PM, Lance Norskog <[email protected]
>>>>>>
>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is this true? - "Preserving pairwise distances" means the
>>>>> relative
>>>>>>>>>>>>>> distances. So the ratios of new distance:old distance should be
>>>>>>>>>>>>>> similar. The standard deviation of the ratios gives a
>>>>> rough&ready
>>>>>>>>>>>>>> measure of the fidelity of the reduction. The standard
>>>>>>>>>>>>>> deviation
>>>>> of
>>>>>>>>>>>>>> simple RP should be highest, then this RP + orthogonalization,
>>>>> then
>>>>>>>>>>>>>> MDS.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jul 1, 2011 at 11:03 AM, Ted Dunning <
>>>>>>> [email protected]>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> Lance,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You would get better results from the random projection if you
>>>>>>> did the
>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>> part of the stochastic SVD.  Basically, you do the random
>>>>>>> projection:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       Y = A \Omega
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> where A is your original data, R is the random matrix and Y is
>>>>>>> the
>>>>>>>>>>>>>> result.
>>>>>>>>>>>>>>>  Y will be tall and skinny.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Then, find an orthogonal basis Q of Y:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       Q R = Y
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This orthogonal basis will be very close to the orthogonal
>>>>> basis
>>>>>>> of A.
>>>>>>>>>>>>>>  In
>>>>>>>>>>>>>>> fact, there are strong probabilistic guarantees on how good Q
>>>>> is
>>>>>>> as a
>>>>>>>>>>>>>> basis
>>>>>>>>>>>>>>> of A.  Next, you project A using the transpose of Q:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       B = Q' A
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This gives you a short fat matrix that is the projection of A
>>>>>>> into a
>>>>>>>>>>>>>> lower
>>>>>>>>>>>>>>> dimensional space.  Since this is a left projection, it isn't
>>>>>>> quite what
>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>> want in your work, but it is the standard way to phrase
>>>>> things.
>>>>>>>  The
>>>>>>>>>>>>>> exact
>>>>>>>>>>>>>>> same thing can be done with left random projection:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>      Y = \Omega A
>>>>>>>>>>>>>>>      L Q = Y
>>>>>>>>>>>>>>>      B = A Q'
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In this form, B is tall and skinny as you would like and Q' is
>>>>>>>>>>>>>> essentially
>>>>>>>>>>>>>>> an orthogonal reformulation of of the random projection.  This
>>>>>>>>>>>>>> projection is
>>>>>>>>>>>>>>> about as close as you are likely to get to something that
>>>>> exactly
>>>>>>>>>>>>>> preserves
>>>>>>>>>>>>>>> distances.  As such, you should be able to use MDS on B to get
>>>>>>> exactly
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> same results as you want.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Additionally, if you start with the original form and do an
>>>>> SVD
>>>>>>> of B
>>>>>>>>>>>>>> (which
>>>>>>>>>>>>>>> is fast), you will get a very good approximation of the
>>>>> prominent
>>>>>>> right
>>>>>>>>>>>>>>> singular vectors of A.  IF you do that, the first few of these
>>>>>>> should be
>>>>>>>>>>>>>>> about as good as MDS for visualization purposes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Jul 1, 2011 at 2:44 AM, Lance Norskog <
>>>>> [email protected]
>>>>>>>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I did some testing and make a lot of pretty charts:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> http://ultrawhizbang.blogspot.com/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If you want to get quick visualizations of your clusters,
>>>>> this
>>>>>>> is a
>>>>>>>>>>>>>>>> great place to start.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Lance Norskog
>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Lance Norskog
>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Lance Norskog
>>>>>>>>>>> [email protected]
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Lance Norskog
>>>>>>>>>> [email protected]
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Lance Norskog
>>>>>>>>> [email protected]
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Lance Norskog
>>>>>>>> [email protected]
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Lance Norskog
>>>>>>> [email protected]
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Lance Norskog
>>>>> [email protected]
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Lance Norskog
>>> [email protected]
>>
>
>
> --
> Lance Norskog
> [email protected]
>



-- 
Lance Norskog
[email protected]

Re: Dimensional Reduction via Random Projection: investigations

Reply via email to