2010/11/23 Fernando Fernández <[email protected]>
> Hi Jake, > > You say that LanczosSolver leaves the eigenvalues in the reverse order that > I expect, but, at least the eigenvectors are given in the same order, > aren't > they? I mean, If I want the top N eigenvalues and eigenvectors, and i ask > for 4N, I should take the last N eigenvalues and the N eigenvectors, right? > The eigenvectors should be paired up with their values, yes. If you look at the class org.apache.mahout.math.hadoop.decomposer.EigenVerificationJob, you'll find the way I typically run the algorithm. I notice in looking at that, that it's not set up to handle symmetric matrices properly - it always assumes the input matrix is asymmetric and you're looking for singular vectors (not eigenvectors of a symmetric matrix). > I read also that there is an issue regarding the number of eigenvalues and > eigenvectors that are returned when you ask LanczosSolver for N of them, > and > I have checked that if I ask for 40, for example, I get only 39. Will this > leave out the biggest eigenvector and eigenvalue or the smallest one, or > none of these? I spent the last weeks reading abouth this in the > discussions > but it's not very clear for me yet. > Your last eigenvector will be the largest one, and the first will be the smallest one. You are getting one rank less than you asked for, however, and that seems to be because the code is treating your initial guess vector (that you feed into the algorithm to start iterative matrix multiplication) as one of the possible eigenvectors (which is most certainly is not). That might not be the issue, but either way, I think if you want the biggest 10 eigenvectors, if you ask for 30 or 40, and take the last 10, you'll have the biggest 10 (not the 2nd biggest through 11th biggest). > Could you give us some help? > > By the way, I'm interested in having Uk and also V*S(-1) so i'm able to > project new rows into the U space... > That's a simple matter of doing some extra matrix multiplications, right? The output of the current implementation of Lanczos is K vectors with dimensionality equal to M = numColumns(inputMatrix), and thus naturally form a dense, row-partitioned K x M matrix = V^t. If you scale each of these singular vectors by 1/(their singular value), you'll have a set of K vectors suitable for projecting new rows onto the reduced dimensional space (of course, you also need to project all of the original rows onto this space too, because you have V^t, not U. In fact, if ever matrix is being represented as a row-matrix, you don't really want U, because that's a matrix of K singular vectors, each having numRows(inputMatrix) values - you want numRows vectors, each having K values: U^t). Does that help? -jake
