This method isn't usually as numerically stable as, for instance, using a QR
decomposition.  If your original data matrix is n x 2, then Q is n x 2 and R
is 2 x 2.  R is trivial to decompose into U S V' and since Q is a unit
matrix, the singular values and right singular vectors of R are your
original goal.  If you want left singular values, use QU.

On Thu, Jun 23, 2011 at 11:03 AM, Jake Mannix <[email protected]> wrote:

> A gazillion rows of 2-columned data is really much better suited to doing
> the following:
>
> if each row is of the form [a, b], then compute the matrix
>
> [[a*a, a*b], [a*b, b*b]]
>
> (the outer product of the vector with itself)
>
> Then take the matrix sum of all of these, from each row of your input
> matrix.
>
> You'll now have a 2x2 matrix, which you can diagonalize by hand.  It will
> give you your eigenvalues, and also the right-singular vectors of your
> original matrix.
>
>  -jake
>
> 2011/6/23 <[email protected]>
>
> > Yes, exactly why I asked it for only 2 eigenvalues. So what is being
> said,
> > is if I have lets say 50M rows of 2 columned data, Lanczos can't do
> > anything with it (assuming it puts the 0 eigenvalue in the mix - of the 2
> > eigenvectors only 1 is returned because of the 0 eigenvalue taking up a
> > slot)?
> >
> > If the eigenvalue of 0 is invalid, then should it not be filtered out so
> > that it returns "rank" number of eigenvalues that could be valid?
> >
> > -Trevor
> >
> > > Ah, if your matrix only has 2 columns, you can't go to rank 10.  Try on
> > > some slightly less synthetic data of more than rank 10.  You can't
> > > ask Lanczos for more reduced rank than that of the matrix itself.
> > >
> > >   -jake
> > >
> > > 2011/6/23 <[email protected]>
> > >
> > >> Alright I can reorder that is easy, just had to verify that the
> ordering
> > >> was correct. So when I increased the rank of the results I get Lanczos
> > >> bailing out. Which incidentally causes a NullPointerException:
> > >>
> > >> INFO: 9 passes through the corpus so far...
> > >> WARNING: Lanczos parameters out of range: alpha = NaN, beta = NaN.
> > >> Bailing out early!
> > >> INFO: Lanczos iteration complete - now to diagonalize the tri-diagonal
> > >> auxiliary matrix.
> > >> Exception in thread "main" java.lang.NullPointerException
> > >>        at
> > >> org.apache.mahout.math.DenseVector.assign(DenseVector.java:133)
> > >>        at
> > >>
> > >>
> >
> org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:160)
> > >>        at pca.PCASolver.solve(PCASolver.java:53)
> > >>        at pca.PCA.main(PCA.java:20)
> > >>
> > >> So I should probably note that my data only has 2 columns, the real
> data
> > >> will have quite a bit more.
> > >>
> > >> The failing happens with 10 and more for rank, with the last, and
> > >> therefore most significant eigenvector being <NaN,NaN>.
> > >>
> > >> -Trevor
> > >> > The 0 eigenvalue output is not valid, and yes, the output will list
> > >> the
> > >> > results
> > >> > in *increasing* order, even though it is finding the largest
> > >> > eigenvalues/vectors
> > >> > first.
> > >> >
> > >> > Remember that convergence is gradual, so if you only ask for 3
> > >> > eigevectors/values, you won't be very accurate.  If you ask for 10
> or
> > >> > more,
> > >> > the
> > >> > largest few will now be quite good.  If you ask for 50, now the top
> > >> 10-20
> > >> > will
> > >> > be *extremely* accurate, and maybe the top 30 will still be quite
> > >> good.
> > >> >
> > >> > Try out a non-distributed form of what is in the
> EigenverificationJob
> > >> to
> > >> > re-order the output and collect how accurate your results are (it
> > >> computes
> > >> > errors for you as well).
> > >> >
> > >> >   -jake
> > >> >
> > >> > 2011/6/23 <[email protected]>
> > >> >
> > >> >> So, I know that MAHOUT-369 fixed a bug with the distributed version
> > >> of
> > >> >> the
> > >> >> LanczosSolver but I am experiencing a similar problem with the
> > >> >> non-distributed version.
> > >> >>
> > >> >> I send a dataset of gaussian distributed numbers (testing PCA
> stuff)
> > >> and
> > >> >> my eigenvalues are seemingly reversed. Below I have the output
> given
> > >> in
> > >> >> the logs from LanczosSolver.
> > >> >>
> > >> >> Output:
> > >> >> INFO: Eigenvector 0 found with eigenvalue 0.0
> > >> >> INFO: Eigenvector 1 found with eigenvalue 347.8703086831804
> > >> >> INFO: LanczosSolver finished.
> > >> >>
> > >> >> So it returns a vector with eigenvalue 0 before one with an
> > >> eigenvalue
> > >> >> of
> > >> >> 347?. Whats more interesting is that when I increase the rank, I
> get
> > >> a
> > >> >> new
> > >> >> eigenvector with a value between 0 and 347:
> > >> >>
> > >> >> INFO: Eigenvector 0 found with eigenvalue 0.0
> > >> >> INFO: Eigenvector 1 found with eigenvalue 44.794928654801566
> > >> >> INFO: Eigenvector 2 found with eigenvalue 347.8286920203704
> > >> >>
> > >> >> Shouldn't the eigenvalues be in descending order? Also is the 0.0
> > >> >> eigenvalue even valid?
> > >> >>
> > >> >> Thanks,
> > >> >> Trevor
> > >> >>
> > >> >>
> > >> >
> > >>
> > >>
> > >>
> > >
> >
> >
> >
>

Reply via email to