Yes, exactly why I asked it for only 2 eigenvalues. So what is being said,
is if I have lets say 50M rows of 2 columned data, Lanczos can't do
anything with it (assuming it puts the 0 eigenvalue in the mix - of the 2
eigenvectors only 1 is returned because of the 0 eigenvalue taking up a
slot)?

If the eigenvalue of 0 is invalid, then should it not be filtered out so
that it returns "rank" number of eigenvalues that could be valid?

-Trevor

> Ah, if your matrix only has 2 columns, you can't go to rank 10.  Try on
> some slightly less synthetic data of more than rank 10.  You can't
> ask Lanczos for more reduced rank than that of the matrix itself.
>
>   -jake
>
> 2011/6/23 <[email protected]>
>
>> Alright I can reorder that is easy, just had to verify that the ordering
>> was correct. So when I increased the rank of the results I get Lanczos
>> bailing out. Which incidentally causes a NullPointerException:
>>
>> INFO: 9 passes through the corpus so far...
>> WARNING: Lanczos parameters out of range: alpha = NaN, beta = NaN.
>> Bailing out early!
>> INFO: Lanczos iteration complete - now to diagonalize the tri-diagonal
>> auxiliary matrix.
>> Exception in thread "main" java.lang.NullPointerException
>>        at
>> org.apache.mahout.math.DenseVector.assign(DenseVector.java:133)
>>        at
>>
>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:160)
>>        at pca.PCASolver.solve(PCASolver.java:53)
>>        at pca.PCA.main(PCA.java:20)
>>
>> So I should probably note that my data only has 2 columns, the real data
>> will have quite a bit more.
>>
>> The failing happens with 10 and more for rank, with the last, and
>> therefore most significant eigenvector being <NaN,NaN>.
>>
>> -Trevor
>> > The 0 eigenvalue output is not valid, and yes, the output will list
>> the
>> > results
>> > in *increasing* order, even though it is finding the largest
>> > eigenvalues/vectors
>> > first.
>> >
>> > Remember that convergence is gradual, so if you only ask for 3
>> > eigevectors/values, you won't be very accurate.  If you ask for 10 or
>> > more,
>> > the
>> > largest few will now be quite good.  If you ask for 50, now the top
>> 10-20
>> > will
>> > be *extremely* accurate, and maybe the top 30 will still be quite
>> good.
>> >
>> > Try out a non-distributed form of what is in the EigenverificationJob
>> to
>> > re-order the output and collect how accurate your results are (it
>> computes
>> > errors for you as well).
>> >
>> >   -jake
>> >
>> > 2011/6/23 <[email protected]>
>> >
>> >> So, I know that MAHOUT-369 fixed a bug with the distributed version
>> of
>> >> the
>> >> LanczosSolver but I am experiencing a similar problem with the
>> >> non-distributed version.
>> >>
>> >> I send a dataset of gaussian distributed numbers (testing PCA stuff)
>> and
>> >> my eigenvalues are seemingly reversed. Below I have the output given
>> in
>> >> the logs from LanczosSolver.
>> >>
>> >> Output:
>> >> INFO: Eigenvector 0 found with eigenvalue 0.0
>> >> INFO: Eigenvector 1 found with eigenvalue 347.8703086831804
>> >> INFO: LanczosSolver finished.
>> >>
>> >> So it returns a vector with eigenvalue 0 before one with an
>> eigenvalue
>> >> of
>> >> 347?. Whats more interesting is that when I increase the rank, I get
>> a
>> >> new
>> >> eigenvector with a value between 0 and 347:
>> >>
>> >> INFO: Eigenvector 0 found with eigenvalue 0.0
>> >> INFO: Eigenvector 1 found with eigenvalue 44.794928654801566
>> >> INFO: Eigenvector 2 found with eigenvalue 347.8286920203704
>> >>
>> >> Shouldn't the eigenvalues be in descending order? Also is the 0.0
>> >> eigenvalue even valid?
>> >>
>> >> Thanks,
>> >> Trevor
>> >>
>> >>
>> >
>>
>>
>>
>


Reply via email to